Amritpal Singh (0851779)¶
Table of Contents
- Installing Required Packages
- General Understanding
- Importings
- Quick Information of data
- EDA
- Cleaning Data
- Encoding
- Distribution Overview
- Imputing Target Variable
- Base Model Check Function
- Imputation Function
- Transforming Data
- Feature Selection
- Dealing with Outliers
- Splitting Data
- Models
- Metrics Comparison
- Model Comparison
- Marketing Strategies
- Visualization
- References
General understanding of data/files¶
- We have 4 files in total
calendar.csvRows Count:- 7966127Columns Count:- 7 (listing_id, date, available, price, adjusted_price, minimum_nights, maximum_nights)
listings.csvRows Count:- 21825Columns Count:- 75 (id, listing_url, scrape_id, last_scraped, source, name, description, neighborhood_overview, picture_url, host_id, host_url, host_name, host_since, host_location, host_about, host_response_time, host_response_rate, host_acceptance_rate, host_is_superhost, host_thumbnail_url, host_picture_url, host_neighbourhood, host_listings_count, host_total_listings_count, host_verifications, host_has_profile_pic, host_identity_verified, neighbourhood, neighbourhood_cleansed, neighbourhood_group_cleansed, latitude, longitude, property_type, room_type, accommodates, bathrooms, bathrooms_text, bedrooms, beds, amenities, price, minimum_nights, maximum_nights, minimum_minimum_nights, maximum_minimum_nights, minimum_maximum_nights, maximum_maximum_nights, minimum_nights_avg_ntm, maximum_nights_avg_ntm, calendar_updated, has_availability, availability_30, availability_60, availability_90, availability_365, calendar_last_scraped, number_of_reviews, number_of_reviews_ltm, number_of_reviews_l30d, first_review, last_review, review_scores_rating, review_scores_accuracy, review_scores_cleanliness, review_scores_checkin, review_scores_communication, review_scores_location, review_scores_value, license, instant_bookable, calculated_host_listings_count, calculated_host_listings_count_entire_homes, calculated_host_listings_count_private_rooms, calculated_host_listings_count_shared_rooms, reviews_per_month)
listings2.csvRows Count:- 21825Columns Count:- 18 (id, name, host_id, host_name, neighbourhood_group, neighbourhood, latitude, longitude, room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365, number_of_reviews_ltm, license)
reviews.csvRows Count:- 573077Columns Count:- 6 (listing_id, id, date, reviewer_id, reviewer_name, comments)
Installing Packages¶
!pip install -r requirements.txt
Requirement already satisfied: joblib==1.4.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 1)) (1.4.2) Requirement already satisfied: numpy==1.26.4 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 2)) (1.26.4) Requirement already satisfied: pandas==2.2.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 3)) (2.2.2) Requirement already satisfied: plotly==5.24.1 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 4)) (5.24.1) Requirement already satisfied: xgboost==2.1.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 5)) (2.1.2) Requirement already satisfied: seaborn==0.13.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 6)) (0.13.2) Requirement already satisfied: matplotlib==3.9.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 7)) (3.9.2) Requirement already satisfied: scikit-learn==1.5.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from -r requirements.txt (line 8)) (1.5.2) Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from pandas==2.2.2->-r requirements.txt (line 3)) (2.9.0) Requirement already satisfied: pytz>=2020.1 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from pandas==2.2.2->-r requirements.txt (line 3)) (2024.2) Requirement already satisfied: tzdata>=2022.7 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from pandas==2.2.2->-r requirements.txt (line 3)) (2024.1) Requirement already satisfied: tenacity>=6.2.0 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from plotly==5.24.1->-r requirements.txt (line 4)) (9.0.0) Requirement already satisfied: packaging in c:\users\acer\appdata\roaming\python\python312\site-packages (from plotly==5.24.1->-r requirements.txt (line 4)) (24.1) Requirement already satisfied: scipy in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from xgboost==2.1.2->-r requirements.txt (line 5)) (1.14.1) Requirement already satisfied: contourpy>=1.0.1 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from matplotlib==3.9.2->-r requirements.txt (line 7)) (1.3.0) Requirement already satisfied: cycler>=0.10 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from matplotlib==3.9.2->-r requirements.txt (line 7)) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from matplotlib==3.9.2->-r requirements.txt (line 7)) (4.53.1) Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from matplotlib==3.9.2->-r requirements.txt (line 7)) (1.4.7) Requirement already satisfied: pillow>=8 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from matplotlib==3.9.2->-r requirements.txt (line 7)) (10.4.0) Requirement already satisfied: pyparsing>=2.3.1 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from matplotlib==3.9.2->-r requirements.txt (line 7)) (3.1.4) Requirement already satisfied: threadpoolctl>=3.1.0 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from scikit-learn==1.5.2->-r requirements.txt (line 8)) (3.5.0) Requirement already satisfied: six>=1.5 in c:\users\acer\anaconda3\envs\deeplearning\lib\site-packages (from python-dateutil>=2.8.2->pandas==2.2.2->-r requirements.txt (line 3)) (1.16.0)
Importings Libraries¶
# Importing Libraries
import os
import ast
import joblib
import warnings
import datetime
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import plotly.express as px
from matplotlib import ticker
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.metrics import silhouette_score, mean_squared_error, mean_absolute_error, r2_score
# Iterative Imputer
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
# Suppressing all the warnings
warnings.filterwarnings('ignore')
# Storing all the data files in a list
data_files = ['calendar', 'listings', 'listings2', 'reviews']
Data Quick Information¶
# Running a quick analysis over the shape and columns of all of our datasets
for file in data_files:
print("File:-", file)
df = pd.read_csv(f'Data-AirBNB//{file}.csv', dtype = str)
rows, columns = df.shape
print("Rows:-", rows)
print("Columns:-", columns)
print(', '.join(df.columns))
print()
File:- calendar Rows:- 7966127 Columns:- 7 listing_id, date, available, price, adjusted_price, minimum_nights, maximum_nights File:- listings Rows:- 21825 Columns:- 75 id, listing_url, scrape_id, last_scraped, source, name, description, neighborhood_overview, picture_url, host_id, host_url, host_name, host_since, host_location, host_about, host_response_time, host_response_rate, host_acceptance_rate, host_is_superhost, host_thumbnail_url, host_picture_url, host_neighbourhood, host_listings_count, host_total_listings_count, host_verifications, host_has_profile_pic, host_identity_verified, neighbourhood, neighbourhood_cleansed, neighbourhood_group_cleansed, latitude, longitude, property_type, room_type, accommodates, bathrooms, bathrooms_text, bedrooms, beds, amenities, price, minimum_nights, maximum_nights, minimum_minimum_nights, maximum_minimum_nights, minimum_maximum_nights, maximum_maximum_nights, minimum_nights_avg_ntm, maximum_nights_avg_ntm, calendar_updated, has_availability, availability_30, availability_60, availability_90, availability_365, calendar_last_scraped, number_of_reviews, number_of_reviews_ltm, number_of_reviews_l30d, first_review, last_review, review_scores_rating, review_scores_accuracy, review_scores_cleanliness, review_scores_checkin, review_scores_communication, review_scores_location, review_scores_value, license, instant_bookable, calculated_host_listings_count, calculated_host_listings_count_entire_homes, calculated_host_listings_count_private_rooms, calculated_host_listings_count_shared_rooms, reviews_per_month File:- listings2 Rows:- 21825 Columns:- 18 id, name, host_id, host_name, neighbourhood_group, neighbourhood, latitude, longitude, room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365, number_of_reviews_ltm, license File:- reviews Rows:- 571853 Columns:- 6 listing_id, id, date, reviewer_id, reviewer_name, comments
calendar file:¶
- listing_id: Unique identifier for each listing.
- date: Date for the listing's availability.
- available: Whether the listing is available on that date (true/false).
- price: Price of the listing per night.
- adjusted_price: Price adjusted based on discounts or surcharges.
- minimum_nights: Minimum number of nights a guest can book.
- maximum_nights: Maximum number of nights a guest can book.
listings file:¶
- id: Unique identifier for the listing.
- listing_url: URL to the listing on Airbnb.
- scrape_id: Identifier for the data scrape session.
- last_scraped: Date when the listing was last scraped.
- source: Source of the listing data (typically Airbnb).
- name: Name/title of the listing.
- description: Detailed description of the listing.
- neighborhood_overview: Overview of the neighborhood where the listing is located.
- picture_url: URL of the listing's main image.
- host_id: Unique identifier for the host.
- host_url: URL to the host's profile on Airbnb.
- host_name: Name of the host.
- host_since: Date when the host joined Airbnb.
- host_location: Location of the host.
- host_about: Personal description written by the host.
- host_response_time: How quickly the host typically responds.
- host_response_rate: Host's response rate as a percentage.
- host_acceptance_rate: Host's acceptance rate for booking requests.
- host_is_superhost: Indicates if the host is a "Superhost" (true/false).
- host_thumbnail_url: URL to the host's thumbnail image.
- host_picture_url: URL to the host's main image.
- host_neighbourhood: Neighbourhood where the host resides.
- host_listings_count: Number of listings the host manages.
- host_total_listings_count: Total number of listings associated with the host.
- host_verifications: Verification methods completed by the host.
- host_has_profile_pic: Indicates if the host has a profile picture (true/false).
- host_identity_verified: Indicates if the host's identity is verified (true/false).
- neighbourhood: Neighbourhood where the listing is located.
- neighbourhood_cleansed: Standardized neighborhood name.
- neighbourhood_group_cleansed: Larger grouping of neighborhoods (if available).
- latitude: Latitude coordinate of the listing.
- longitude: Longitude coordinate of the listing.
- property_type: Type of property (e.g., apartment, house).
- room_type: Type of room offered (e.g., entire place, private room).
- accommodates: Number of guests the listing can accommodate.
- bathrooms: Number of bathrooms.
- bathrooms_text: Descriptive text about the bathroom setup.
- bedrooms: Number of bedrooms.
- beds: Number of beds.
- amenities: List of amenities provided.
- price: Price of the listing per night.
- minimum_nights: Minimum number of nights required for booking.
- maximum_nights: Maximum number of nights allowed for booking.
- minimum_minimum_nights: Shortest minimum night requirement across booking windows.
- maximum_minimum_nights: Longest minimum night requirement across booking windows.
- minimum_maximum_nights: Shortest maximum night limit across booking windows.
- maximum_maximum_nights: Longest maximum night limit across booking windows.
- minimum_nights_avg_ntm: Average minimum nights required for future bookings.
- maximum_nights_avg_ntm: Average maximum nights allowed for future bookings.
- calendar_updated: How recently the calendar was updated.
- has_availability: Indicates if the listing has availability (true/false).
- availability_30: Number of available nights in the next 30 days.
- availability_60: Number of available nights in the next 60 days.
- availability_90: Number of available nights in the next 90 days.
- availability_365: Number of available nights in the next 365 days.
- calendar_last_scraped: Date when the calendar was last scraped.
- number_of_reviews: Total number of reviews for the listing.
- number_of_reviews_ltm: Number of reviews in the last 12 months.
- number_of_reviews_l30d: Number of reviews in the last 30 days.
- first_review: Date of the first review for the listing.
- last_review: Date of the most recent review.
- review_scores_rating: Overall rating score based on guest reviews.
- review_scores_accuracy: Accuracy rating based on guest reviews.
- review_scores_cleanliness: Cleanliness rating based on guest reviews.
- review_scores_checkin: Check-in process rating based on guest reviews.
- review_scores_communication: Communication rating based on guest reviews.
- review_scores_location: Location rating based on guest reviews.
- review_scores_value: Value-for-money rating based on guest reviews.
- license: License number for the listing (if applicable).
- instant_bookable: Indicates if the listing is available for instant booking (true/false).
- calculated_host_listings_count: Number of listings under the same host.
- calculated_host_listings_count_entire_homes: Number of entire home listings by the host.
- calculated_host_listings_count_private_rooms: Number of private room listings by the host.
- calculated_host_listings_count_shared_rooms: Number of shared room listings by the host.
- reviews_per_month: Average number of reviews the listing receives per month.
listings2 file:¶
- id: Unique identifier for the listing.
- name: Name/title of the listing.
- host_id: Unique identifier for the host.
- host_name: Name of the host.
- neighbourhood_group: Larger grouping of neighborhoods (if available).
- neighbourhood: Neighbourhood where the listing is located.
- latitude: Latitude coordinate of the listing.
- longitude: Longitude coordinate of the listing.
- room_type: Type of room offered (e.g., entire place, private room).
- price: Price of the listing per night.
- minimum_nights: Minimum number of nights required for booking.
- number_of_reviews: Total number of reviews for the listing.
- last_review: Date of the most recent review.
- reviews_per_month: Average number of reviews the listing receives per month.
- calculated_host_listings_count: Number of listings under the same host.
- availability_365: Number of available nights in the next 365 days.
- number_of_reviews_ltm: Number of reviews in the last 12 months.
- license: License number for the listing (if applicable).
reviews file:¶
- listing_id: Unique identifier for the listing being reviewed.
- id: Unique identifier for the review.
- date: Date when the review was posted.
- reviewer_id: Unique identifier for the reviewer.
- reviewer_name: Name of the reviewer.
- comments: Comments left by the reviewer.
EDA¶
We will be using the dataset listings for Segmentation. We will run the EDA and the model according to segmentation problem.
# Reading listings2 dataset as this will be used for segmentation
df = pd.read_csv("Data-AirBNB//listings.csv")
listings = df.copy()
listings.head(5)
| id | listing_url | scrape_id | last_scraped | source | name | description | neighborhood_overview | picture_url | host_id | ... | review_scores_communication | review_scores_location | review_scores_value | license | instant_bookable | calculated_host_listings_count | calculated_host_listings_count_entire_homes | calculated_host_listings_count_private_rooms | calculated_host_listings_count_shared_rooms | reviews_per_month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1419 | https://www.airbnb.com/rooms/1419 | 2.024090e+13 | 9/6/2024 | previous scrape | Beautiful home in amazing area! | This large, family home is located in one of T... | The apartment is located in the Ossington stri... | https://a0.muscache.com/pictures/76206750/d643... | 1565 | ... | 5.00 | 5.00 | 5.00 | NaN | f | 1 | 1 | 0 | 0 | 0.05 |
| 1 | 8077 | https://www.airbnb.com/rooms/8077 | 2.024090e+13 | 9/6/2024 | previous scrape | Downtown Harbourfront Private Room | Guest room in a luxury condo with access to al... | NaN | https://a0.muscache.com/pictures/11780344/141c... | 22795 | ... | 4.90 | 4.92 | 4.83 | NaN | f | 2 | 1 | 1 | 0 | 0.92 |
| 2 | 26654 | https://www.airbnb.com/rooms/26654 | 2.024090e+13 | 9/6/2024 | city scrape | World Class @ CN Tower, convention centre, The... | CN Tower, TIFF Bell Lightbox, Metro Convention... | There's a reason they call it the Entertainmen... | https://a0.muscache.com/pictures/81811785/5dcd... | 113345 | ... | 4.76 | 4.86 | 4.67 | NaN | f | 5 | 5 | 0 | 0 | 0.25 |
| 3 | 27423 | https://www.airbnb.com/rooms/27423 | 2.024090e+13 | 9/6/2024 | city scrape | Executive Studio Unit- Ideal for One Person | Brand new, fully furnished studio basement apa... | NaN | https://a0.muscache.com/pictures/176936/b687ed... | 118124 | ... | 5.00 | 4.87 | 4.87 | NaN | f | 1 | 1 | 0 | 0 | 0.17 |
| 4 | 30931 | https://www.airbnb.com/rooms/30931 | 2.024090e+13 | 9/6/2024 | previous scrape | Downtown Toronto - Waterview Condo | Split level waterfront condo with a breathtaki... | NaN | https://a0.muscache.com/pictures/227971/e8ebd7... | 22795 | ... | NaN | NaN | NaN | NaN | f | 2 | 1 | 1 | 0 | 0.01 |
5 rows × 75 columns
# Shape of our dataset
listings.shape
(21825, 75)
Let's overview the features to see which are obvious to delete
listings.columns
Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'source', 'name',
'description', 'neighborhood_overview', 'picture_url', 'host_id',
'host_url', 'host_name', 'host_since', 'host_location', 'host_about',
'host_response_time', 'host_response_rate', 'host_acceptance_rate',
'host_is_superhost', 'host_thumbnail_url', 'host_picture_url',
'host_neighbourhood', 'host_listings_count',
'host_total_listings_count', 'host_verifications',
'host_has_profile_pic', 'host_identity_verified', 'neighbourhood',
'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'latitude',
'longitude', 'property_type', 'room_type', 'accommodates', 'bathrooms',
'bathrooms_text', 'bedrooms', 'beds', 'amenities', 'price',
'minimum_nights', 'maximum_nights', 'minimum_minimum_nights',
'maximum_minimum_nights', 'minimum_maximum_nights',
'maximum_maximum_nights', 'minimum_nights_avg_ntm',
'maximum_nights_avg_ntm', 'calendar_updated', 'has_availability',
'availability_30', 'availability_60', 'availability_90',
'availability_365', 'calendar_last_scraped', 'number_of_reviews',
'number_of_reviews_ltm', 'number_of_reviews_l30d', 'first_review',
'last_review', 'review_scores_rating', 'review_scores_accuracy',
'review_scores_cleanliness', 'review_scores_checkin',
'review_scores_communication', 'review_scores_location',
'review_scores_value', 'license', 'instant_bookable',
'calculated_host_listings_count',
'calculated_host_listings_count_entire_homes',
'calculated_host_listings_count_private_rooms',
'calculated_host_listings_count_shared_rooms', 'reviews_per_month'],
dtype='object')
On checking the above features, some features which are obvious to delete are as follow:-
idlisting_urlscrape_idlast_scrapedsourcenamedescriptionneighborhood_overviewpicture_urlhost_idhost_urlhost_namehost_locationhost_abouthost_thumbnail_urlhost_picture_urlneighbourhoodneighbourhood_group_cleansedcalendar_updatedcalendar_last_scraped
Above are the obvious features which are not needed for further analysis.
# Dropping unnecessary features from the dataset
listings.drop(
columns = [
'id', 'listing_url', 'scrape_id', 'last_scraped', 'source', 'name', 'description', 'neighborhood_overview',
'picture_url', 'host_id', 'host_url', 'host_name', 'host_location', 'host_about',
'host_thumbnail_url', 'host_picture_url', 'neighbourhood',
'neighbourhood_group_cleansed', 'calendar_updated', 'calendar_last_scraped'
], inplace = True
)
listings.shape
(21825, 55)
As we deleted some obvious features but still left with 57 features. Now, we will deal with the fetaures which carries values not suitable for model as well as encoding so, we will do transformation before moving forward.
# Checking the unique values of all the features:
for col in listings.columns:
print(f"{col} = {listings[col].unique()}")
print()
host_since = ['8/8/2008' '6/22/2009' '4/25/2010' ... '8/31/2024' '9/2/2024' '9/3/2024']
host_response_time = [nan 'within a few hours' 'within an hour' 'within a day'
'a few days or more']
host_response_rate = [nan '100%' '77%' '50%' '88%' '80%' '0%' '97%' '33%' '90%' '86%' '94%'
'96%' '75%' '67%' '91%' '98%' '69%' '60%' '40%' '92%' '95%' '25%' '70%'
'20%' '30%' '76%' '83%' '89%' '78%' '93%' '99%' '79%' '71%' '85%' '65%'
'10%' '73%' '8%' '63%' '82%' '57%' '13%' '14%' '17%' '45%' '6%' '74%'
'47%' '87%' '9%' '26%' '81%' '55%' '62%' '27%' '58%' '84%' '22%' '46%'
'64%' '29%']
host_acceptance_rate = [nan '38%' '100%' '60%' '62%' '94%' '89%' '50%' '0%' '96%' '86%' '83%'
'46%' '42%' '75%' '95%' '92%' '80%' '67%' '82%' '40%' '98%' '97%' '71%'
'87%' '73%' '69%' '78%' '93%' '61%' '76%' '91%' '37%' '90%' '88%' '66%'
'84%' '99%' '65%' '74%' '33%' '17%' '77%' '85%' '79%' '56%' '70%' '59%'
'31%' '68%' '14%' '63%' '20%' '25%' '28%' '48%' '81%' '43%' '29%' '64%'
'51%' '53%' '22%' '49%' '44%' '15%' '30%' '27%' '24%' '39%' '58%' '35%'
'21%' '72%' '57%' '55%' '36%' '11%' '34%' '47%' '18%' '52%' '8%' '5%'
'13%' '54%' '41%' '23%' '12%' '26%' '45%' '9%' '32%' '16%' '10%' '2%'
'7%']
host_is_superhost = ['f' 't' nan]
host_neighbourhood = ['Commercial Drive' 'Harbourfront' 'Entertainment District'
'Greenwood-Coxwell' 'Parkdale' 'The Beaches' 'Rosedale' 'Niagara'
'High Park North' 'Scarborough City Centre' 'Downtown Toronto'
'The Junction' 'Oakridge' 'Little Portugal' 'Studio District'
'Garden District' 'Yorkville' 'The Annex' 'Fairbank' 'Deer Park'
'The Pocket' 'Davisville' 'Willowdale' 'Fashion District'
'Flemingdon Park' 'The Danforth' 'Amesbury' 'Oakwood' 'Dovercourt Park'
'Trinity-Bellwoods' 'Roncesvalles' 'Palmerston/Little Italy' 'Mimico'
'Riverdale' 'Woodbine Corridor' 'Cliffside' 'Broadview North'
'Morningside' 'Cabbagetown' 'Saint Lawrence' 'Don Valley Village' nan
'South Hill/Rathnelly' 'Wallace Emerson' 'Danforth Village' 'Corktown'
'Westminster/Branson' 'Greek Town' 'Cedarvale Humewood' 'Dufferin Grove'
'Islington' 'Parkwoods' 'Old East York' 'Agincourt'
'Saint Andrew/Windfields' 'Yonge Eglinton' 'Bedford Park' 'Bendale'
'Glen Park' 'Le Plateau' 'Mount Dennis' 'Newtonbrook'
'Stonegate-Queensway' 'Clanton Park' 'Bayview' 'Henry Farm' 'Cliffcrest'
'Kensington Market' 'New Toronto' 'Runnymede' 'Lytton Park' 'Forest Hill'
'Guildwood' 'Alderwood' 'Lambton Baby Point' 'Woodbine/Lumsden'
'Wychwood Park' "Tam O'Shanter" 'Parkview' "L'Amoreaux" 'Birch Cliff'
'Casa Loma' 'Wexford/Maryvale' 'Lawrence Park' 'York University Heights'
'Leaside' 'Pellam Park' 'The Kingsway' 'Eglinton East'
'Financial District' 'Ionview' 'Swansea' 'Long Branch' 'Bayview Village'
'Weston' 'Eringate' 'West Humber' 'Unionville' 'Jane and Finch'
'Westmount' 'Morningside Heights' 'Rexdale' 'West Rouge' 'Richview'
'Don Mills' 'Santa Monica' 'Markland Woods' 'Princess'
'Rockcliffe Smythe' 'Dorset Park' 'Armour Heights' 'Thorncliffe Park'
'Clairlea' 'Malvern' 'Pelmo Park' 'Etobicoke West Mall' 'Thistletown'
'West Hill' 'Scarborough Junction' 'Mount Olive' 'The Westway'
'Scarborough Village' 'Manse Valley' 'Keelesdale' 'Woburn'
'Humber Valley' 'Highland Creek' 'The Elms' 'Nortown' 'Clinton Hill'
'Pleasant View' 'Govalle' 'Sunnybrook' 'Victoria Village' 'Burnaby'
'Thornhill' 'North Park' 'Hillcrest Village' 'Downsview' 'Samac'
'Humbermede' ' Puntarenas residence' 'Port Union' 'West Oak Trails'
'Recoleta' 'Upper East Side' 'Downtown Vancouver' 'Milliken' 'Calica'
'Ocean Park' 'Humberlea' 'Humber Summit' 'South Cambie' 'Crescent Town'
'Malvern West' 'Downtown Montreal' 'Merkaz HaIr' 'Queenston' 'South Core'
'Beachborough' 'Lauderdale Isles' 'Beltline' 'Bellas Vistas' 'Sherkston'
'Fenelon Falls' 'Erin Mills' 'Recreio dos Bandeirantes' 'Downtown Miami'
'KDA Scheme 5' 'Cote-des-Neiges' 'Vanier' 'Lakeshore' 'Astrodome'
'CONDOMINIOS CANTA MAR' 'Chappel East' 'Northside' 'University'
'Ancaster' 'Cooksville' 'Kitsilano' 'Woodbridge' 'Marpole'
'Berczy Village' 'Santo Agostinho' 'Lakeview'
'Deutschstown Historic District' 'Bayview Glen' 'Antarayin'
'Letitia Heights' 'Bolton' 'Port Dalhousie' 'Western Hill' 'Maple'
'East Credit' 'Clearview' 'La Veleta' 'Somerset Brooke' 'Kamay'
'Sage Hill' 'Valley Creek' 'Stipley' 'Waterdown' 'Grapeview' 'Venice'
'Coboconk' 'KW Hospital' 'Landsdale' 'Keswick' 'Burnt River'
'Shoreline West' 'Meadowvale' 'Knight' 'Central Vancouver'
'Civic Hospital - Experimental Farm - Central Park'
'Industrial Sector A and Keith' 'Khu phố 3' 'Willmott' 'East Windsor'
'Phường 3' 'South Fort Lauderdale' 'Central Hamilton' 'Cadboro Bay'
'Acton' 'Aldershot' 'Centretown' 'Midtown Toronto' 'Northglen'
'Sainte-Rose' 'Crestview' 'Nautilus' 'Bebedero' 'Glendale' 'Donevan'
'Dixie' 'Concord' 'Fairview' 'Hollywood Lakes' 'Barra do Cunhau'
'Bonnington' 'Malton' 'Willow Beach' 'Symons Valley' 'ChampionsGate'
'District des Riverains' 'High Park-Swansea' 'Notre-Dame-de-Grace'
'White Oaks' 'Kedron' 'Allapattah' 'Downtown' 'Pinheiros'
'Wismer Commons' 'Little Havana' 'Rosemary District' 'Port Sydney'
'Lower Mount Royal' 'Castle Green' 'South Beach' 'Silvertown' 'West Bend'
'Bedford-Stuyvesant' 'Uptown Core' 'Crescent Heights' 'Port Credit'
'Old Malton Village' 'Playa Pelada' 'Cambuí' 'Streetsville' 'Sector B'
'Willoughby' 'Glenridge' 'Windfields' 'Flatlands' 'Central City'
'Victoria Island' 'West End' 'Notting Hill' 'Paradise Valley Village'
'La Florida' 'Beverley Glen' 'Historic Old Town' 'LB of Islington'
'Varna Center' 'Spring Valley' 'South Los Angeles' 'Far North Dallas'
'Zona Hotelera' 'Mount Hope' 'Central Oshawa'
'Afton Oaks / River Oaks Area' 'La Bainerie' 'Hamilton Road'
'Normanhurst' 'Country Hills East' 'Whitmore Park' 'Saint-Henri' 'Medway'
'Westover Hills' 'Carling' 'Beechwood' 'Evanston' 'Paia' 'Paquita'
'Jackson' 'Heritage Valley' 'North Glenora' 'North End East'
'Golf Club Manor' 'Reunion' 'East Vancouver' 'Highland Lakes'
'Victorian District - East' 'Sandpointe' 'Downtown Dartmouth'
'Fifth by Northwest' 'Sainte-Dorothée' 'Tempo' "Za'abeel 1"
'Saint-Timothée' 'West Oakville' 'Burj Residence Phase I & II' 'Mineola'
'Riverside' 'Kerrisdale' 'Durand' 'Westboro']
host_listings_count = [ 1. 2. 5. 4. 3. 9. 8. 19. 6. 7. 103. 34. 11. 14.
16. 10. 13. 22. 25. 17. 12. 29. 15. nan 82. 38. 33. 26.
18. 105. 21. 27. 284. 36. 484. 46. 30. 32. 47. 39. 20. 140.
24. 44. 28. 35. 50. 51. 63. 45. 95. 75. 84. 52. 31. 83.
23. 89. 96. 49. 182. 48. 37. 55. 147. 90. 152. 54. 41.]
host_total_listings_count = [ 1. 3. 10. 5. 6. 19. 18. 2. 8. 4. 7. 9. 24. 14.
180. 11. 41. 17. 29. 16. 13. 96. 27. 26. 45. 60. 32. 33.
nan 23. 12. 54. 37. 20. 106. 46. 53. 55. 109. 84. 15. 67.
268. 30. 22. 62. 21. 42. 34. 39. 28. 312. 44. 59. 40. 619.
50. 334. 38. 79. 31. 48. 121. 150. 86. 61. 123. 63. 47. 216.
35. 25. 555. 43. 49. 66. 75. 194. 36. 100. 88. 176. 172. 57.
52. 108. 58. 219. 242. 190. 164. 101. 94. 51. 91. 171.]
host_verifications = ["['email', 'phone']" "['email', 'phone', 'work_email']" "['phone']"
"['phone', 'work_email']" "['email']" "['work_email']" '[]'
"['email', 'work_email']" nan]
host_has_profile_pic = ['t' 'f' nan]
host_identity_verified = ['t' 'f' nan]
neighbourhood_cleansed = ['Little Portugal' 'Waterfront Communities-The Island' 'South Riverdale'
'South Parkdale' 'The Beaches' 'Rosedale-Moore Park'
'Bay Street Corridor' 'Church-Yonge Corridor' 'Niagara' 'High Park North'
'Woburn' 'Junction Area' 'Oakridge' 'Cabbagetown-South St.James Town'
'Annex' 'Caledonia-Fairbank' 'Casa Loma' 'North St.James Town'
'Blake-Jones' 'Moss Park' 'Mount Pleasant West' 'Willowdale East'
'Palmerston-Little Italy' 'Flemingdon Park' 'East End-Danforth'
'Brookhaven-Amesbury' 'Oakwood Village'
'Dovercourt-Wallace Emerson-Junction' 'Trinity-Bellwoods' 'Roncesvalles'
'Mimico (includes Humber Bay Shores)' 'Woodbine Corridor'
'Birchcliffe-Cliffside' 'Broadview North' 'Morningside'
'Kensington-Chinatown' 'High Park-Swansea' 'Don Valley Village'
'Danforth' 'Newtonbrook West' 'Playter Estates-Danforth'
'Greenwood-Coxwell' 'Regent Park' 'Dufferin Grove' 'North Riverdale'
'Humewood-Cedarvale' 'Mount Pleasant East' 'Taylor-Massey' 'University'
'Islington-City Centre West' 'Parkwoods-Donalda' 'Yonge-St.Clair'
'Old East York' 'Corso Italia-Davenport' 'Agincourt South-Malvern West'
'St.Andrew-Windfields' 'Yonge-Eglinton' 'Lawrence Park North' 'Bendale'
'Englemount-Lawrence' 'Mount Dennis' 'Willowdale West'
'Stonegate-Queensway' 'Rockcliffe-Smythe' 'Clanton Park'
'Bayview Woods-Steeles' 'Bayview Village' 'Cliffcrest' 'New Toronto'
'Agincourt North' 'Etobicoke West Mall' 'Bedford Park-Nortown'
'Forest Hill South' 'Guildwood' 'Alderwood' "L'Amoreaux"
'Lambton Baby Point' 'Woodbine-Lumsden' 'Danforth East York'
'Bridle Path-Sunnybrook-York Mills' 'Wychwood'
'Runnymede-Bloor West Village' "Tam O'Shanter-Sullivan"
'Lansing-Westgate' 'Long Branch' 'Steeles' 'Wexford/Maryvale'
'Lawrence Park South' 'York University Heights' 'Briar Hill-Belgravia'
'Westminster-Branson' 'Leaside-Bennington' 'Hillcrest Village'
'Weston-Pellam Park' 'Bathurst Manor' 'Kingsway South' 'Ionview'
'Downsview-Roding-CFB' 'Weston' 'Pelmo Park-Humberlea'
'Clairlea-Birchmount' 'Eglinton East' 'Yorkdale-Glen Park'
'Eringate-Centennial-West Deane' 'West Humber-Clairville' 'Kennedy Park'
'Newtonbrook East' 'Black Creek' 'Beechborough-Greenbrook'
'Edenbridge-Humber Valley' 'Rouge' 'West Hill' 'Rexdale-Kipling'
'Willowridge-Martingrove-Richview' "O'Connor-Parkview" 'Victoria Village'
'Henry Farm' 'Banbury-Don Mills' 'Markland Wood' 'Princess-Rosethorn'
'Dorset Park' 'Kingsview Village-The Westway' 'Keelesdale-Eglinton West'
'Thorncliffe Park' 'Scarborough Village' 'Malvern' 'Pleasant View'
'Thistletown-Beaumond Heights' 'Mount Olive-Silverstone-Jamestown'
'Glenfield-Jane Heights' 'Highland Creek' 'Elms-Old Rexdale'
'Forest Hill North' 'Maple Leaf' 'Humbermede' 'Humber Heights-Westmount'
'Centennial Scarborough' 'Milliken' 'Humber Summit' 'Rustic']
latitude = [43.6459 43.6408 43.64608 ... 43.67552425 43.6584633
43.64129161]
longitude = [-79.42423 -79.37673 -79.39032 ... -79.44212902 -79.3841276
-79.39637268]
property_type = ['Entire home' 'Private room in rental unit' 'Entire condo'
'Entire rental unit' 'Private room in condo' 'Private room in home'
'Entire townhouse' 'Entire loft' 'Entire guest suite'
'Private room in townhouse' 'Entire serviced apartment'
'Shared room in rental unit' 'Private room in guest suite'
'Entire guesthouse' 'Private room in cottage' 'Entire place'
'Private room in bungalow' 'Private room in loft' 'Private room'
'Private room in serviced apartment' 'Entire bungalow'
'Shared room in home' 'Private room in guesthouse' 'Shared room in condo'
'Private room in bed and breakfast' 'Shared room in townhouse'
'Private room in barn' 'Entire villa' 'Tiny home' 'Floor'
'Private room in villa' 'Shared room in hostel' 'Entire cottage'
'Private room in castle' 'Shared room in loft' 'Entire home/apt'
'Private room in hostel' 'Shared room in guesthouse' 'Camper/RV'
'Room in boutique hotel' 'Shared room in bungalow' 'Earthen home'
'Shared room in boat' 'Private room in tiny home' 'Room in hotel'
'Private room in earthen home' 'Boat' 'Island'
'Private room in casa particular' 'Entire vacation home'
'Private room in vacation home' 'Room in aparthotel' 'Castle'
'Shipping container' 'Shared room in bed and breakfast'
'Shared room in hotel' 'Shared room in casa particular' 'Cave'
'Private room in cycladic house' 'Shared room']
room_type = ['Entire home/apt' 'Private room' 'Shared room']
accommodates = [10 2 4 1 5 3 6 8 7 9 16 13 14 12 11 15]
bathrooms = [nan 1. 0.5 2. 1.5 2.5 4. 5. 3. 0. 4.5 3.5 5.5 6.5 6. 8. ]
bathrooms_text = ['3 baths' '1.5 baths' '1 bath' '1 private bath' '1 shared bath'
'Half-bath' '2 baths' '1.5 shared baths' '0 baths' '2.5 baths' '4 baths'
'5 baths' '2 shared baths' '3.5 baths' '0 shared baths' '3 shared baths'
'4.5 baths' nan '5.5 baths' '6.5 baths' '4 shared baths'
'2.5 shared baths' 'Shared half-bath' '3.5 shared baths'
'4.5 shared baths' '6 baths' 'Private half-bath' '8 baths']
bedrooms = [ 5. nan 1. 0. 2. 3. 4. 9. 8. 6. 7. 50. 12. 10.]
beds = [nan 2. 1. 3. 4. 5. 6. 0. 7. 8. 9. 10. 12. 11.]
amenities = ['["TV", "First aid kit", "Wifi", "Kitchen", "Dryer", "Essentials", "Indoor fireplace", "Shampoo", "Smoke alarm", "Washer", "Heating", "Air conditioning", "Fire extinguisher"]'
'["Wifi", "Pool", "TV with standard cable", "Shampoo", "Free parking on premises", "Elevator", "Smoke alarm", "Gym", "Heating", "Air conditioning"]'
'["Wifi", "Paid parking on premises", "Essentials", "Elevator", "Extra pillows and blankets", "Long term stays allowed", "Iron", "Dedicated workspace", "Electric stove", "Single level home", "Bed linens", "Free washer \\u2013 In unit", "Building staff", "Smoke alarm", "Hot water", "Heating", "Oven", "Children\\u2019s dinnerware", "Hair dryer", "Pets allowed", "Luggage dropoff allowed", "Dishwasher", "Coffee maker", "Free dryer \\u2013 In unit", "Dishes and silverware", "Self check-in", "Microwave", "Patio or balcony", "Fire extinguisher", "Private entrance", "Kitchen", "Refrigerator", "Exercise equipment", "TV with standard cable", "Shared gym in building", "Shampoo", "City skyline view", "Shared pool - available all year", "Carbon monoxide alarm", "Central air conditioning", "Cooking basics", "Private hot tub", "Hangers"]'
...
'["Wifi", "Paid parking on premises", "Dryer", "Elevator", "Long term stays allowed", "Toaster", "Iron", "TV", "Bed linens", "Hot water kettle", "Smoke alarm", "Freezer", "Hot water", "Heating", "Oven", "Air conditioning", "Baking sheet", "Hair dryer", "Pets allowed", "Housekeeping - included with your stay", "Dining table", "Dishwasher", "Coffee maker", "Dishes and silverware", "Pool table", "BBQ grill", "Bathtub", "Microwave", "Exterior security cameras on property", "Kitchen", "Wine glasses", "Movie theater", "Refrigerator", "Private patio or balcony", "Exercise equipment", "Shared gym in building", "Stove", "Blender", "Washer", "Extra pillows and blankets", "Cooking basics", "Clothing storage", "Hangers"]'
'["Wifi", "Paid parking on premises", "Dryer", "Elevator", "Long term stays allowed", "Toaster", "Iron", "Dedicated workspace", "TV", "Bed linens", "Hot water kettle", "Shared patio or balcony", "Smoke alarm", "Freezer", "Hot water", "Heating", "Oven", "Air conditioning", "Baking sheet", "Hair dryer", "Pets allowed", "Housekeeping - included with your stay", "Dining table", "Dishwasher", "Coffee maker", "Dishes and silverware", "Pool table", "Bathtub", "Microwave", "Exterior security cameras on property", "Kitchen", "Wine glasses", "Movie theater", "Refrigerator", "Exercise equipment", "Shared gym in building", "Stove", "Blender", "Shared sauna", "Washer", "Extra pillows and blankets", "Cooking basics", "Clothing storage", "Hangers"]'
'["Wifi", "Dryer", "Essentials", "Cleaning products", "Ethernet connection", "Elevator", "Host greets you", "Long term stays allowed", "Toaster", "Iron", "TV", "Bed linens", "Hot water kettle", "Smoke alarm", "Freezer", "Hot water", "Heating", "Oven", "Air conditioning", "Hair dryer", "Coffee", "Luggage dropoff allowed", "Dishwasher", "Coffee maker", "Dishes and silverware", "Microwave", "Patio or balcony", "Conditioner", "Private entrance", "Kitchen", "Refrigerator", "Body soap", "Paid parking off premises", "High chair", "Shampoo", "Stove", "Carbon monoxide alarm", "Washer", "Extra pillows and blankets", "Cooking basics", "Clothing storage", "Hangers"]']
price = [nan '$172.00 ' '$75.00 ' '$79.00 ' '$126.00 ' '$148.00 ' '$90.00 '
'$163.00 ' '$50.00 ' '$309.00 ' '$66.00 ' '$129.00 ' '$84.00 ' '$250.00 '
'$295.00 ' '$92.00 ' '$300.00 ' '$322.00 ' '$80.00 ' '$200.00 ' '$44.00 '
'$60.00 ' '$280.00 ' '$100.00 ' '$99.00 ' '$288.00 ' '$361.00 '
'$115.00 ' '$30.00 ' '$62.00 ' '$69.00 ' '$55.00 ' '$279.00 ' '$106.00 '
'$110.00 ' '$108.00 ' '$399.00 ' '$97.00 ' '$324.00 ' '$65.00 '
'$149.00 ' '$119.00 ' '$45.00 ' '$150.00 ' '$120.00 ' '$190.00 '
'$83.00 ' '$95.00 ' '$180.00 ' '$500.00 ' '$116.00 ' '$145.00 '
'$444.00 ' '$440.00 ' '$271.00 ' '$278.00 ' '$98.00 ' '$88.00 '
'$1,000.00 ' '$87.00 ' '$196.00 ' '$475.00 ' '$470.00 ' '$350.00 '
'$121.00 ' '$160.00 ' '$130.00 ' '$125.00 ' '$439.00 ' '$78.00 '
'$225.00 ' '$255.00 ' '$222.00 ' '$186.00 ' '$77.00 ' '$275.00 '
'$71.00 ' '$135.00 ' '$131.00 ' '$72.00 ' '$166.00 ' '$152.00 '
'$140.00 ' '$214.00 ' '$156.00 ' '$101.00 ' '$91.00 ' '$168.00 '
'$396.00 ' '$85.00 ' '$93.00 ' '$187.00 ' '$128.00 ' '$220.00 '
'$249.00 ' '$59.00 ' '$449.00 ' '$170.00 ' '$499.00 ' '$330.00 '
'$155.00 ' '$218.00 ' '$109.00 ' '$269.00 ' '$175.00 ' '$236.00 '
'$265.00 ' '$53.00 ' '$146.00 ' '$153.00 ' '$86.00 ' '$173.00 ' '$81.00 '
'$57.00 ' '$76.00 ' '$326.00 ' '$302.00 ' '$167.00 ' '$216.00 '
'$162.00 ' '$105.00 ' '$259.00 ' '$185.00 ' '$157.00 ' '$888.00 '
'$246.00 ' '$237.00 ' '$598.00 ' '$111.00 ' '$89.00 ' '$61.00 '
'$268.00 ' '$230.00 ' '$370.00 ' '$349.00 ' '$143.00 ' '$550.00 '
'$64.00 ' '$70.00 ' '$224.00 ' '$94.00 ' '$999.00 ' '$364.00 ' '$191.00 '
'$138.00 ' '$179.00 ' '$264.00 ' '$124.00 ' '$67.00 ' '$400.00 '
'$198.00 ' '$282.00 ' '$489.00 ' '$137.00 ' '$799.00 ' '$254.00 '
'$258.00 ' '$338.00 ' '$63.00 ' '$852.00 ' '$313.00 ' '$54.00 '
'$189.00 ' '$134.00 ' '$659.00 ' '$96.00 ' '$39.00 ' '$690.00 '
'$341.00 ' '$182.00 ' '$895.00 ' '$375.00 ' '$229.00 ' '$445.00 '
'$164.00 ' '$335.00 ' '$122.00 ' '$266.00 ' '$139.00 ' '$169.00 '
'$383.00 ' '$42.00 ' '$58.00 ' '$900.00 ' '$112.00 ' '$68.00 ' '$133.00 '
'$141.00 ' '$240.00 ' '$405.00 ' '$123.00 ' '$299.00 ' '$307.00 '
'$118.00 ' '$48.00 ' '$213.00 ' '$773.00 ' '$104.00 ' '$298.00 '
'$102.00 ' '$41.00 ' '$273.00 ' '$29.00 ' '$485.00 ' '$385.00 '
'$352.00 ' '$40.00 ' '$147.00 ' '$286.00 ' '$233.00 ' '$434.00 '
'$195.00 ' '$320.00 ' '$540.00 ' '$208.00 ' '$113.00 ' '$491.00 '
'$245.00 ' '$82.00 ' '$221.00 ' '$136.00 ' '$241.00 ' '$270.00 '
'$33.00 ' '$35.00 ' '$316.00 ' '$339.00 ' '$293.00 ' '$142.00 '
'$248.00 ' '$450.00 ' '$424.00 ' '$580.00 ' '$205.00 ' '$199.00 '
'$291.00 ' '$161.00 ' '$51.00 ' '$325.00 ' '$210.00 ' '$235.00 '
'$398.00 ' '$890.00 ' '$403.00 ' '$165.00 ' '$127.00 ' '$171.00 '
'$528.00 ' '$244.00 ' '$204.00 ' '$337.00 ' '$56.00 ' '$329.00 '
'$151.00 ' '$38.00 ' '$289.00 ' '$215.00 ' '$599.00 ' '$262.00 '
'$547.00 ' '$183.00 ' '$47.00 ' '$1,828.00 ' '$194.00 ' '$263.00 '
'$52.00 ' '$586.00 ' '$549.00 ' '$46.00 ' '$277.00 ' '$260.00 '
'$514.00 ' '$73.00 ' '$178.00 ' '$321.00 ' '$103.00 ' '$203.00 '
'$1,451.00 ' '$297.00 ' '$411.00 ' '$379.00 ' '$281.00 ' '$829.00 '
'$36.00 ' '$242.00 ' '$211.00 ' '$750.00 ' '$2,000.00 ' '$318.00 '
'$74.00 ' '$590.00 ' '$914.00 ' '$421.00 ' '$5,000.00 ' '$219.00 '
'$1,200.00 ' '$480.00 ' '$132.00 ' '$285.00 ' '$296.00 ' '$417.00 '
'$414.00 ' '$529.00 ' '$456.00 ' '$251.00 ' '$239.00 ' '$395.00 '
'$154.00 ' '$595.00 ' '$256.00 ' '$114.00 ' '$345.00 ' '$600.00 '
'$380.00 ' '$49.00 ' '$238.00 ' '$247.00 ' '$3,500.00 ' '$428.00 '
'$43.00 ' '$290.00 ' '$371.00 ' '$28.00 ' '$177.00 ' '$159.00 '
'$212.00 ' '$10,000.00 ' '$234.00 ' '$158.00 ' '$328.00 ' '$538.00 '
'$184.00 ' '$197.00 ' '$957.00 ' '$107.00 ' '$756.00 ' '$431.00 '
'$721.00 ' '$176.00 ' '$2,026.00 ' '$856.00 ' '$231.00 ' '$202.00 '
'$358.00 ' '$369.00 ' '$310.00 ' '$764.00 ' '$995.00 ' '$276.00 '
'$315.00 ' '$181.00 ' '$37.00 ' '$629.00 ' '$886.00 ' '$201.00 '
'$577.00 ' '$569.00 ' '$232.00 ' '$253.00 ' '$800.00 ' '$929.00 '
'$243.00 ' '$9,999.00 ' '$206.00 ' '$553.00 ' '$303.00 ' '$217.00 '
'$314.00 ' '$783.00 ' '$571.00 ' '$393.00 ' '$31.00 ' '$117.00 '
'$207.00 ' '$579.00 ' '$1,895.00 ' '$34.00 ' '$636.00 ' '$344.00 '
'$663.00 ' '$589.00 ' '$625.00 ' '$925.00 ' '$465.00 ' '$272.00 '
'$228.00 ' '$493.00 ' '$536.00 ' '$188.00 ' '$407.00 ' '$427.00 '
'$404.00 ' '$593.00 ' '$283.00 ' '$413.00 ' '$453.00 ' '$526.00 '
'$546.00 ' '$466.00 ' '$515.00 ' '$679.00 ' '$343.00 ' '$591.00 '
'$257.00 ' '$174.00 ' '$850.00 ' '$495.00 ' '$736.00 ' '$419.00 '
'$420.00 ' '$306.00 ' '$647.00 ' '$356.00 ' '$394.00 ' '$144.00 '
'$467.00 ' '$267.00 ' '$430.00 ' '$32.00 ' '$363.00 ' '$455.00 '
'$1,656.00 ' '$323.00 ' '$418.00 ' '$412.00 ' '$426.00 ' '$710.00 '
'$333.00 ' '$360.00 ' '$684.00 ' '$274.00 ' '$252.00 ' '$192.00 '
'$209.00 ' '$877.00 ' '$1,827.00 ' '$772.00 ' '$359.00 ' '$950.00 '
'$362.00 ' '$714.00 ' '$699.00 ' '$785.00 ' '$15.00 ' '$457.00 '
'$12,400.00 ' '$649.00 ' '$226.00 ' '$389.00 ' '$376.00 ' '$700.00 '
'$429.00 ' '$406.00 ' '$284.00 ' '$1,085.00 ' '$336.00 ' '$655.00 '
'$384.00 ' '$386.00 ' '$292.00 ' '$937.00 ' '$3,570.00 ' '$193.00 '
'$342.00 ' '$223.00 ' '$261.00 ' '$608.00 ' '$425.00 ' '$438.00 '
'$650.00 ' '$319.00 ' '$3,000.00 ' '$27.00 ' '$304.00 ' '$739.00 '
'$760.00 ' '$2,050.00 ' '$1,151.00 ' '$459.00 ' '$433.00 ' '$19.00 '
'$3.00 ' '$1,999.00 ' '$693.00 ' '$473.00 ' '$354.00 ' '$423.00 '
'$340.00 ' '$317.00 ' '$401.00 ' '$25.00 ' '$585.00 ' '$509.00 '
'$22.00 ' '$920.00 ' '$507.00 ' '$442.00 ' '$1,351.00 ' '$441.00 '
'$451.00 ' '$698.00 ' '$725.00 ' '$381.00 ' '$357.00 ' '$656.00 '
'$792.00 ' '$365.00 ' '$1,500.00 ' '$334.00 ' '$347.00 ' '$504.00 '
'$227.00 ' '$287.00 ' '$377.00 ' '$657.00 ' '$368.00 ' '$539.00 '
'$1,400.00 ' '$1,295.00 ' '$355.00 ' '$415.00 ' '$372.00 ' '$382.00 '
'$443.00 ' '$820.00 ' '$20.00 ' '$986.00 ' '$402.00 ' '$374.00 '
'$2,150.00 ' '$331.00 ' '$479.00 ' '$1,199.00 ' '$408.00 ' '$745.00 '
'$825.00 ' '$447.00 ' '$378.00 ' '$564.00 ' '$594.00 ' '$749.00 '
'$662.00 ' '$711.00 ' '$532.00 ' '$1,542.00 ' '$1,165.00 ' '$305.00 '
'$471.00 ' '$543.00 ' '$478.00 ' '$689.00 ' '$410.00 ' '$1,795.00 '
'$484.00 ' '$849.00 ' '$542.00 ' '$294.00 ' '$611.00 ' '$541.00 '
'$332.00 ' '$446.00 ' '$522.00 ' '$1,171.00 ' '$555.00 ' '$498.00 '
'$587.00 ' '$530.00 ' '$592.00 ' '$26.00 ' '$628.00 ' '$842.00 '
'$523.00 ' '$448.00 ' '$346.00 ' '$943.00 ' '$915.00 ' '$21.00 '
'$390.00 ' '$14.00 ' '$734.00 ' '$614.00 ' '$642.00 ' '$694.00 '
'$581.00 ' '$841.00 ' '$460.00 ' '$748.00 ' '$391.00 ' '$605.00 '
'$437.00 ' '$18.00 ' '$436.00 ' '$1,160.00 ' '$557.00 ' '$327.00 '
'$494.00 ' '$678.00 ' '$521.00 ' '$621.00 ' '$828.00 ' '$533.00 '
'$519.00 ' '$819.00 ' '$676.00 ' '$827.00 ' '$845.00 ' '$1,850.00 '
'$476.00 ' '$397.00 ' '$560.00 ' '$373.00 ' '$311.00 ' '$308.00 '
'$818.00 ' '$524.00 ' '$2,500.00 ' '$301.00 ' '$351.00 ' '$746.00 '
'$607.00 ' '$795.00 ' '$839.00 ' '$791.00 ' '$388.00 ' '$1,365.00 '
'$837.00 ' '$789.00 ' '$545.00 ' '$612.00 ' '$588.00 ' '$416.00 '
'$367.00 ' '$4,000.00 ' '$481.00 ' '$312.00 ' '$469.00 ' '$948.00 '
'$762.00 ' '$692.00 ' '$4,500.00 ' '$686.00 ' '$387.00 ' '$670.00 '
'$1,335.00 ' '$1,183.00 ' '$574.00 ' '$604.00 ' '$1,515.00 ' '$1,450.00 '
'$664.00 ' '$742.00 ' '$1,439.00 ' '$624.00 ' '$525.00 ' '$671.00 '
'$666.00 ' '$606.00 ' '$511.00 ' '$353.00 ' '$483.00 ' '$366.00 '
'$1,057.00 ' '$981.00 ' '$462.00 ' '$568.00 ' '$464.00 ' '$510.00 '
'$794.00 ' '$730.00 ' '$8,002.00 ' '$775.00 ' '$905.00 ' '$761.00 '
'$558.00 ' '$1,415.00 ' '$646.00 ' '$573.00 ' '$1,014.00 ' '$1,750.00 '
'$575.00 ' '$674.00 ' '$501.00 ' '$1,185.00 ' '$899.00 ' '$497.00 '
'$554.00 ' '$4,119.00 ' '$1,430.00 ' '$781.00 ' '$1,603.00 ' '$853.00 '
'$1,040.00 ' '$616.00 ' '$731.00 ' '$1,100.00 ' '$1,426.00 ' '$615.00 '
'$1,143.00 ' '$1,600.00 ' '$409.00 ' '$1,300.00 ' '$720.00 ' '$776.00 '
'$584.00 ' '$840.00 ' '$613.00 ' '$552.00 ' '$643.00 ' '$988.00 '
'$1,520.00 ' '$713.00 ' '$392.00 ' '$435.00 ' '$1,177.00 ' '$348.00 '
'$1,279.00 ' '$520.00 ' '$871.00 ' '$518.00 ' '$562.00 ' '$946.00 '
'$637.00 ' '$610.00 ' '$665.00 ' '$4,310.00 ' '$24.00 ' '$422.00 '
'$717.00 ' '$548.00 ' '$959.00 ' '$630.00 ' '$924.00 ' '$486.00 '
'$640.00 ' '$488.00 ' '$1,252.00 ' '$474.00 ' '$751.00 ' '$534.00 '
'$517.00 ' '$998.00 ' '$487.00 ' '$8,000.00 ' '$513.00 ' '$790.00 '
'$780.00 ' '$490.00 ' '$8,820.00 ' '$1,384.00 ' '$1,352.00 ' '$2,200.00 '
'$477.00 ' '$960.00 ' '$620.00 ' '$744.00 ' '$722.00 ' '$458.00 '
'$836.00 ' '$1,121.00 ' '$738.00 ' '$461.00 ' '$970.00 ' '$516.00 '
'$857.00 ' '$506.00 ' '$875.00 ' '$1,757.00 ' '$556.00 ' '$945.00 '
'$705.00 ' '$983.00 ' '$609.00 ' '$576.00 ' '$787.00 ' '$672.00 '
'$661.00 ' '$535.00 ' '$1,220.00 ' '$1,770.00 ' '$578.00 ' '$454.00 '
'$851.00 ' '$527.00 ' '$1,343.00 ' '$531.00 ' '$570.00 ' '$12.00 '
'$432.00 ' '$502.00 ' '$811.00 ' '$468.00 ' '$864.00 ' '$846.00 '
'$561.00 ' '$2,805.00 ' '$866.00 ' '$2,650.00 ' '$916.00 ' '$23.00 '
'$1,350.00 ' '$685.00 ' '$1,465.00 ' '$1,059.00 ' '$1,114.00 ' '$902.00 '
'$796.00 ' '$833.00 ']
minimum_nights = [ 28 180 90 750 91 120 150 3 85 31 18 2 29 30
5 1 183 10 4 365 80 200 60 7 13 100 14 32
700 21 12 40 240 56 45 210 6 89 1124 300 185 250
366 88 62 1000 84 1125 119 74 8 135 20 22 730 360
57 75 99 175 181 179 140 184 168 160 9 500 65 299
92 35 50 333 110 450 59 58 182 359 239 128 137 124
375 55 121 220 114 130 47 15 42 364 49 1120 600 1100
153 44 170 358 64 155 228 33 270 25 174 330 34]
maximum_nights = [ 730 365 1125 90 1100 125 30 180 400 31 500 60
162 182 150 1124 50 270 112 366 1000 7 120 99
28 100 14 160 250 460 999 108 19 700 35 360
130 15 21 64 56 10 190 3 300 48 45 62
179 4 260 27 600 200 17 20 364 1123 80 3000
32 33 888 52 285 9 40 93 29 1095 900 2
102 75 61 601 5 122 72 352 38 650 42 95
555 16 91 22 240 12 85 135 71 1114 1111 88
87 36 731 121 2000 550 70 65 25 3650 450 92
375 185 665 356 181 729 222 67 777 187 26 800
829 55 399 96 210 139 395 6 53 168 73 186
58 101 380 44 46 355 10001 666 362 720 350 68
89 13 115 1121 123 979 105 214 340 165 59 220
41 175 280 265 11 183 342 8 18 34 51 110
140 84 320 74 193 63 49 114 201 225 725 590
39 184 79 94 236 69 124 37 178 106 170 367
330 23 290 1120 325 188 104 245 24 465 152 128
111 368 66 393 281 370 275 1 161 149 145 155
230 279 118 138 117 369 176 420 97 336 239 358
326 335 299 127 189 361 363 346 98 307 177 83
195]
minimum_minimum_nights = [ 28 180 90 750 91 120 150 1 60 10 185 18 2 29
30 5 183 31 3 4 365 80 32 7 13 100 14 700
21 12 40 240 56 210 6 89 36 45 1124 300 250 366
88 62 1000 84 500 1125 119 74 8 135 22 730 92 50
360 57 75 99 175 181 179 140 184 168 65 178 35 85
333 110 350 59 19 53 160 9 200 58 182 359 103 239
128 137 42 124 375 55 121 220 114 130 47 280 15 49
20 16 1120 600 1100 153 44 170 358 64 155 228 33 270
364 25 174 330 34]
maximum_minimum_nights = [ 28 180 90 750 91 120 150 4 85 185 18 2 29 30
5 183 31 10 3 1 365 80 200 60 7 13 100 14
32 700 21 12 40 240 56 6 210 89 133 45 55 1124
300 250 366 88 62 1000 84 552 1125 119 74 8 135 22
730 50 360 57 75 99 175 181 179 58 140 184 168 349
9 95 500 65 178 92 35 333 110 59 160 106 182 359
280 239 128 137 124 375 121 220 114 130 20 47 235 61
290 15 42 364 49 222 1120 600 225 1100 153 44 170 358
27 64 155 228 33 39 270 69 25 174 330 350 16 34
24]
minimum_maximum_nights = [ 730 365 1125 90 1100 125 30 180 400 31 500 60
162 182 150 1124 50 270 112 366 1000 7 120 99
28 100 14 160 4 250 460 999 108 19 700 35
360 130 21 64 56 10 190 3 300 48 62 179
260 27 600 200 17 20 3000 32 45 33 888 52
285 9 40 93 29 1095 900 2 102 75 61 601
5 364 72 352 38 650 42 95 555 16 91 240
12 85 135 71 1114 88 87 36 731 121 2000 550
70 122 65 25 3650 450 92 80 375 185 15 356
181 67 187 26 800 829 55 399 96 210 139 1
395 6 168 73 186 58 380 44 46 10001 666 362
350 68 89 13 1123 115 53 979 105 720 214 340
59 220 41 280 265 183 355 342 8 18 34 51
110 140 320 74 193 63 11 84 201 225 241 590
39 184 725 94 236 124 37 106 170 367 330 23
1120 290 188 79 175 465 128 111 368 152 393 281
370 161 149 24 145 66 155 729 230 279 118 138
245 117 159 369 176 420 97 239 22 358 326 335
165 123 127 189 361 363 346 98 307 177 83 178
195]
maximum_maximum_nights = [ 730 365 1125 90 1100 125 30 180 400 31 500 60
162 182 150 1124 50 270 112 366 1000 7 120 99
28 100 14 160 250 460 999 108 19 700 35 360
130 21 64 56 10 190 3 300 48 62 179 4
260 27 600 200 17 20 3000 32 45 33 888 52
285 9 40 93 29 1095 900 2 102 75 61 601
5 364 72 352 38 650 42 95 555 16 91 240
12 85 135 71 1114 88 87 36 731 121 2000 550
70 122 65 25 3650 450 92 80 375 185 15 356
181 67 187 26 800 829 55 399 96 210 139 395
6 168 73 186 58 380 44 46 10001 666 362 350
68 89 13 1123 115 979 105 720 214 340 59 220
41 280 265 183 355 342 8 18 34 51 110 140
320 74 193 63 11 84 201 225 590 39 184 725
94 236 124 37 106 170 367 330 23 1120 290 188
79 175 465 128 111 368 152 393 281 370 1 161
149 24 145 66 155 729 230 279 118 138 245 117
369 176 420 97 239 22 358 326 335 165 123 127
189 361 363 346 98 307 177 83 178 195]
minimum_nights_avg_ntm = [2.800e+01 1.800e+02 9.000e+01 7.500e+02 9.100e+01 1.200e+02 1.500e+02
3.900e+00 6.140e+01 2.760e+01 1.850e+02 1.800e+01 2.000e+00 2.900e+01
3.000e+01 5.000e+00 1.880e+01 1.830e+02 3.100e+01 3.300e+00 9.900e+00
3.000e+00 2.830e+01 1.000e+00 4.000e+00 1.350e+01 6.200e+00 3.650e+02
8.000e+01 1.230e+01 1.669e+02 6.000e+01 2.850e+01 7.000e+00 1.300e+01
1.000e+02 1.400e+01 3.200e+01 7.000e+02 1.100e+00 2.900e+00 2.100e+01
1.200e+01 1.640e+01 4.000e+01 2.400e+02 9.300e+00 1.400e+00 2.770e+01
5.600e+01 5.840e+01 5.100e+00 2.100e+02 8.410e+01 6.000e+00 2.890e+01
8.900e+01 1.000e+01 1.500e+00 1.247e+02 1.300e+00 4.300e+00 1.280e+01
1.590e+01 1.960e+01 4.500e+01 3.500e+01 1.124e+03 3.000e+02 2.910e+01
2.300e+00 2.500e+02 3.660e+02 2.600e+00 2.810e+01 8.800e+01 6.200e+01
6.300e+00 4.800e+00 6.900e+00 2.870e+01 1.000e+03 4.900e+00 6.930e+01
8.400e+01 5.433e+02 1.125e+03 2.100e+00 1.190e+02 7.400e+01 1.210e+01
1.688e+02 8.420e+01 8.000e+00 1.350e+02 2.200e+01 7.300e+02 4.710e+01
2.400e+00 1.148e+02 5.000e+01 8.440e+01 3.600e+02 5.700e+01 8.630e+01
1.700e+00 7.500e+01 9.900e+01 1.750e+02 1.810e+02 4.280e+01 2.720e+01
3.040e+01 5.300e+01 2.180e+01 2.790e+01 2.750e+01 2.700e+00 1.790e+02
5.750e+01 1.290e+01 3.600e+00 1.340e+01 5.140e+01 3.430e+01 1.450e+01
2.560e+01 5.670e+01 1.400e+02 5.760e+01 5.500e+00 1.840e+02 1.680e+02
1.410e+01 3.442e+02 5.720e+01 7.080e+01 9.000e+00 4.940e+01 5.000e+02
1.780e+01 4.100e+00 5.430e+01 7.610e+01 4.090e+01 7.700e+00 8.540e+01
6.500e+01 4.400e+00 2.690e+01 1.780e+02 3.200e+00 9.200e+01 4.670e+01
4.200e+00 1.760e+01 1.900e+00 5.200e+00 9.560e+01 2.330e+01 2.000e+01
8.500e+01 8.900e+00 1.860e+01 3.290e+01 3.330e+02 1.010e+01 5.730e+01
1.200e+00 6.500e+00 5.540e+01 1.100e+02 1.490e+01 1.650e+01 2.950e+01
3.629e+02 5.900e+01 2.990e+01 5.910e+01 5.440e+01 5.330e+01 6.900e+01
8.600e+01 3.350e+01 2.980e+01 2.240e+01 1.600e+02 2.200e+00 4.730e+01
3.090e+01 8.700e+00 2.840e+01 2.860e+01 2.914e+02 3.400e+00 3.070e+01
1.430e+01 2.660e+01 1.970e+01 9.100e+00 3.020e+01 6.990e+01 3.800e+01
1.380e+01 5.800e+00 2.800e+00 8.600e+00 5.740e+01 3.080e+01 1.800e+00
5.550e+01 5.800e+01 4.250e+01 2.000e+02 3.710e+01 5.400e+00 1.820e+02
6.400e+00 3.590e+02 3.100e+00 2.700e+01 9.400e+00 5.780e+01 1.578e+02
2.329e+02 2.390e+02 1.280e+02 2.390e+01 2.970e+01 1.370e+02 2.280e+01
3.330e+01 4.600e+01 2.540e+01 3.590e+01 3.210e+01 1.092e+02 8.510e+01
7.200e+00 1.240e+02 5.600e+00 6.760e+01 3.750e+02 3.010e+01 5.640e+01
5.500e+01 1.210e+02 8.460e+01 4.540e+01 2.200e+02 1.140e+02 4.410e+01
8.230e+01 4.330e+01 5.690e+01 1.300e+02 1.065e+02 1.540e+01 2.820e+01
2.380e+01 1.070e+01 4.360e+01 1.140e+01 5.850e+01 3.060e+01 8.090e+01
4.700e+01 1.070e+02 1.955e+02 6.600e+00 2.590e+01 4.240e+01 3.860e+01
2.450e+01 2.270e+01 3.260e+01 4.700e+00 6.330e+01 1.160e+01 9.740e+01
1.910e+01 1.470e+01 1.250e+01 2.800e+02 7.220e+01 2.738e+02 4.600e+00
3.900e+01 1.500e+01 8.300e+01 4.200e+01 8.110e+01 2.696e+02 5.490e+01
9.180e+01 2.930e+01 1.038e+02 4.900e+01 2.564e+02 7.320e+01 1.950e+01
5.520e+01 1.600e+00 1.870e+01 3.120e+01 4.920e+01 1.055e+02 8.010e+01
8.860e+01 4.390e+01 2.670e+01 9.980e+01 1.025e+02 1.110e+01 8.200e+00
3.560e+01 1.120e+03 1.090e+01 1.318e+02 2.550e+01 9.020e+01 5.700e+00
1.040e+01 4.210e+01 2.250e+01 2.090e+01 8.570e+01 1.659e+02 4.500e+00
2.960e+01 3.370e+01 6.000e+02 8.450e+01 6.100e+00 1.120e+01 8.870e+01
2.440e+01 1.077e+02 5.810e+01 4.760e+01 1.683e+02 2.740e+01 1.720e+01
1.100e+03 1.600e+01 7.800e+00 1.180e+01 1.530e+02 1.320e+01 4.400e+01
1.560e+01 1.050e+01 1.129e+02 2.780e+01 2.060e+01 5.680e+01 8.100e+00
3.441e+02 3.700e+00 2.230e+01 7.100e+00 1.310e+01 1.480e+01 9.720e+01
7.740e+01 2.340e+01 1.750e+01 9.800e+00 1.240e+01 1.020e+01 5.410e+01
1.170e+01 1.700e+02 1.580e+01 1.530e+01 1.550e+01 1.150e+01 3.580e+02
2.580e+01 4.470e+01 4.780e+01 4.260e+01 6.400e+01 5.300e+00 1.550e+02
1.461e+02 4.720e+01 2.420e+01 3.800e+00 4.490e+01 2.370e+01 2.500e+00
2.220e+01 3.570e+01 1.796e+02 8.800e+00 4.310e+01 2.360e+01 1.060e+01
2.610e+01 2.430e+01 2.280e+02 2.050e+01 2.310e+01 2.410e+01 5.900e+00
2.210e+01 2.570e+01 2.630e+01 6.800e+00 7.300e+00 3.140e+01 1.710e+01
3.970e+01 4.440e+01 2.110e+01 2.029e+02 1.455e+02 2.020e+01 3.300e+01
1.030e+01 2.400e+01 1.680e+01 1.390e+01 1.460e+01 1.138e+02 1.330e+01
1.775e+02 1.360e+01 1.420e+01 3.500e+00 3.700e+01 1.440e+01 1.724e+02
9.600e+00 2.880e+01 8.300e+00 3.640e+01 2.700e+02 3.220e+01 1.740e+01
1.510e+01 3.550e+01 2.510e+01 8.960e+01 2.620e+01 3.050e+01 1.990e+01
4.380e+01 2.470e+01 9.700e+00 5.660e+01 5.570e+01 7.490e+01 1.049e+02
1.700e+01 1.162e+02 7.810e+01 9.730e+01 2.650e+01 2.520e+01 6.550e+01
1.820e+01 5.310e+01 1.520e+01 1.080e+01 1.190e+01 3.110e+01 1.670e+01
3.470e+01 3.630e+01 2.500e+01 8.520e+01 2.730e+01 1.689e+02 2.300e+01
9.450e+01 1.810e+01 1.980e+01 7.300e+01 3.640e+02 4.320e+01 5.880e+01
1.260e+01 3.540e+01 2.920e+01 1.100e+01 1.655e+02 5.150e+01 7.500e+00
7.800e+01 1.639e+02 3.280e+01 1.610e+01 3.030e+01 2.160e+01 1.740e+02
3.840e+01 6.120e+01 3.395e+02 1.220e+01 3.300e+02 1.840e+01 3.620e+01
2.030e+01 2.120e+01 2.680e+01 1.690e+01 1.543e+02 2.040e+01 5.590e+01
8.250e+01 1.327e+02 3.410e+01 3.334e+02 1.270e+01 1.930e+01 2.130e+01
4.160e+01 8.590e+01 2.290e+01 2.070e+01 2.140e+01 3.940e+01 9.570e+01
2.260e+01 1.067e+02 4.230e+01 8.480e+01 7.880e+01 3.400e+01 6.700e+00
9.430e+01]
maximum_nights_avg_ntm = [7.3000e+02 3.6500e+02 1.1250e+03 9.0000e+01 1.1000e+03 1.2500e+02
3.0000e+01 1.8000e+02 4.0000e+02 3.1000e+01 5.0000e+02 6.0000e+01
1.6200e+02 1.8200e+02 1.5000e+02 1.1240e+03 5.0000e+01 2.7000e+02
1.1200e+02 3.6600e+02 1.0000e+03 7.0000e+00 1.2000e+02 9.9000e+01
2.8000e+01 1.0000e+02 1.4000e+01 1.6000e+02 2.6700e+01 7.8850e+02
2.5000e+02 4.6000e+02 9.9900e+02 1.0800e+02 1.9000e+01 7.0000e+02
3.5000e+01 3.6000e+02 1.3000e+02 1.0264e+03 2.1000e+01 6.4000e+01
6.2590e+02 5.6000e+01 1.0000e+01 1.9000e+02 3.0000e+00 3.0000e+02
4.8000e+01 6.2000e+01 1.7900e+02 4.0000e+00 2.6000e+02 2.7000e+01
6.0000e+02 2.0000e+02 1.7000e+01 2.0000e+01 3.0000e+03 3.2000e+01
4.5000e+01 3.3000e+01 8.8800e+02 5.2000e+01 2.8500e+02 9.0000e+00
4.0000e+01 9.3000e+01 2.9000e+01 1.0950e+03 9.0000e+02 2.0000e+00
1.0200e+02 7.5000e+01 6.1000e+01 6.0100e+02 5.0000e+00 3.6400e+02
7.2000e+01 3.5200e+02 3.8000e+01 6.5000e+02 4.2000e+01 9.5000e+01
5.5500e+02 1.6000e+01 9.1000e+01 1.1172e+03 2.4000e+02 1.2000e+01
8.5000e+01 1.3500e+02 7.1000e+01 1.1140e+03 8.8000e+01 8.7000e+01
3.6000e+01 7.3100e+02 9.8840e+02 1.2100e+02 2.0000e+03 5.5000e+02
7.0000e+01 1.2200e+02 6.5000e+01 2.5000e+01 3.6500e+03 4.5000e+02
9.2000e+01 8.0000e+01 3.7500e+02 3.3140e+02 1.8500e+02 1.5000e+01
3.5600e+02 1.8100e+02 6.7000e+01 3.9830e+02 1.8700e+02 2.6000e+01
8.0000e+02 1.6680e+02 8.2900e+02 5.5000e+01 3.9900e+02 9.6000e+01
2.1000e+02 1.3900e+02 1.0675e+03 7.9080e+02 6.5490e+02 3.9500e+02
6.0000e+00 1.6800e+02 7.3000e+01 6.3160e+02 1.8600e+02 5.8000e+01
1.0966e+03 3.8000e+02 6.5280e+02 6.2970e+02 4.4000e+01 4.6000e+01
1.0001e+04 2.4610e+02 6.6600e+02 1.0970e+03 3.6200e+02 8.1150e+02
3.5000e+02 6.8000e+01 8.9000e+01 9.4100e+02 1.3000e+01 1.1230e+03
1.1500e+02 5.4400e+01 9.7900e+02 1.0500e+02 1.1173e+03 7.2000e+02
2.1400e+02 3.4000e+02 5.9000e+01 2.2000e+02 4.1000e+01 2.8000e+02
2.6500e+02 4.4900e+01 1.8300e+02 3.5500e+02 3.4200e+02 8.0000e+00
5.3660e+02 1.8000e+01 1.0767e+03 3.1950e+02 3.4000e+01 5.1000e+01
1.1000e+02 1.4000e+02 2.6100e+01 3.2000e+02 7.4000e+01 6.8470e+02
1.5670e+02 1.9300e+02 6.3000e+01 1.1000e+01 5.6640e+02 1.1120e+03
8.4000e+01 2.0100e+02 2.2500e+02 3.4640e+02 2.8070e+02 2.9380e+02
3.1830e+02 2.0600e+02 8.5460e+02 8.1380e+02 5.4450e+02 5.9000e+02
3.9000e+01 1.8400e+02 7.2500e+02 3.5710e+02 1.6100e+02 5.2110e+02
9.6020e+02 8.8060e+02 9.1100e+02 9.4000e+01 1.6650e+02 1.5960e+02
1.5890e+02 4.6290e+02 3.8130e+02 1.6450e+02 2.1930e+02 2.9940e+02
2.3600e+02 8.8890e+02 3.0700e+02 1.4070e+02 1.5600e+02 1.9180e+02
8.6030e+02 1.1199e+03 8.1330e+02 1.1174e+03 1.2400e+02 3.7000e+01
1.0600e+02 1.7000e+02 3.6700e+02 1.0547e+03 1.6560e+02 3.3000e+02
2.3000e+01 1.1200e+03 2.9000e+02 2.9030e+02 1.0945e+03 3.4290e+02
3.3970e+02 7.6430e+02 4.6940e+02 1.8800e+02 1.0996e+03 1.9240e+02
7.9000e+02 7.9000e+01 6.9600e+02 1.3500e+01 7.6780e+02 7.7480e+02
4.1750e+02 6.3290e+02 1.5600e+01 4.8230e+02 1.7500e+02 4.6500e+02
1.2800e+02 1.1100e+02 1.4250e+02 9.6260e+02 2.9500e+01 3.6800e+02
1.5200e+02 3.9300e+02 1.1100e+01 2.8100e+02 3.7000e+02 1.0000e+00
1.4020e+02 1.4900e+02 1.5300e+02 4.2280e+02 7.9490e+02 5.9260e+02
5.9930e+02 4.9110e+02 2.4000e+01 9.2380e+02 6.6620e+02 6.7320e+02
7.8350e+02 7.4750e+02 3.8260e+02 1.0921e+03 1.8300e+01 1.4500e+02
6.6000e+01 2.9590e+02 1.5500e+02 7.2900e+02 2.3000e+02 2.9970e+02
1.1232e+03 9.3030e+02 1.1054e+03 5.6400e+01 2.7900e+02 1.8870e+02
1.1800e+02 1.7250e+02 1.3800e+02 2.4500e+02 1.1700e+02 2.7300e+01
2.6080e+02 1.0422e+03 2.9200e+02 2.4120e+02 2.5980e+02 2.4570e+02
2.6340e+02 2.2620e+02 6.5920e+02 8.1540e+02 5.6370e+02 1.2020e+02
9.7910e+02 2.7530e+02 3.6900e+02 1.8610e+02 3.5670e+02 1.7600e+02
4.2000e+02 8.4660e+02 6.9250e+02 1.0397e+03 9.7000e+01 2.7800e+01
1.1245e+03 3.5690e+02 3.6340e+02 1.0640e+02 3.2850e+02 2.3900e+02
2.8600e+01 1.0555e+03 1.1198e+03 2.0320e+02 2.2000e+01 3.5800e+02
1.0718e+03 3.2600e+02 3.3500e+02 6.1190e+02 1.6500e+02 3.6250e+02
3.8250e+02 2.0520e+02 1.2300e+02 9.3350e+02 4.7470e+02 1.2700e+02
1.7310e+02 1.1220e+02 1.8900e+02 3.0450e+02 3.6100e+02 3.6190e+02
3.3070e+02 9.9030e+02 9.9700e+02 3.8600e+01 1.0864e+03 7.4150e+02
3.1380e+02 2.7310e+02 3.6300e+02 3.4600e+02 9.8000e+01 4.6870e+02
3.8300e+01 9.3170e+02 3.1310e+02 7.2130e+02 1.1149e+03 1.0900e+01
1.6840e+02 1.6470e+02 1.1244e+03 6.9420e+02 3.6850e+02 1.7700e+02
5.8360e+02 1.6490e+02 8.8680e+02 5.8740e+02 8.3000e+01 5.1380e+02
1.7800e+02 1.9500e+02 3.9460e+02]
has_availability = ['t' nan 'f']
availability_30 = [ 0 29 4 1 10 3 8 28 5 23 14 17 30 16 6 11 9 7 24 2 12 13 20 25
22 15 19 21 18 26 27]
availability_60 = [ 0 13 59 34 31 10 3 18 16 58 21 7 35 20 55 26 53 1 5 44 24 17 60 33
47 6 36 11 4 38 40 56 14 41 27 23 28 39 37 12 32 43 25 45 42 2 29 30
46 50 19 51 15 49 9 8 52 48 54 57 22]
availability_90 = [ 0 16 89 57 3 61 4 10 48 46 88 14 51 37 64 38 55 56 83 1 35 74 54 45
90 52 63 77 6 66 33 13 31 30 68 70 9 44 71 5 53 58 69 67 50 65 28 62
41 34 7 43 72 2 27 32 12 40 76 59 80 85 17 42 47 26 11 60 18 21 8 86
73 36 49 20 84 79 23 22 75 24 15 87 19 81 29 39 78 25 82]
availability_365 = [ 0 74 364 57 278 336 248 279 123 259 218 36 323 247 29 229 72 321
363 97 239 141 217 4 64 313 206 236 246 358 1 310 216 349 189 88
144 134 90 151 157 179 135 312 365 142 338 77 174 257 341 187 326 260
79 138 89 154 328 65 10 91 30 158 70 99 56 319 345 54 94 199
128 273 333 249 342 66 230 340 258 69 208 233 41 339 215 309 110 6
43 102 35 129 184 2 280 83 315 14 303 300 293 318 27 307 178 5
124 19 140 287 87 166 334 145 133 139 355 146 262 175 205 108 353 238
198 268 117 329 201 305 11 71 173 51 308 291 220 152 241 266 60 155
264 222 46 20 237 21 296 165 50 225 331 33 23 39 149 191 335 193
180 219 348 234 92 324 119 120 214 47 317 344 232 320 181 286 61 105
244 106 347 298 112 48 116 245 242 93 80 75 161 306 98 301 290 167
118 59 322 346 207 243 53 67 63 325 13 107 164 160 159 359 31 40
131 316 332 95 143 42 24 26 253 251 28 115 351 49 270 263 122 277
281 169 274 337 125 267 81 231 137 68 255 362 224 12 188 182 170 100
190 171 304 111 221 356 132 183 62 15 32 177 275 136 265 17 34 361
127 272 114 299 223 352 252 289 84 210 302 37 228 147 354 212 185 22
150 9 285 357 172 256 261 38 52 156 148 25 86 73 282 104 227 162
311 330 3 103 200 16 121 168 7 18 204 196 44 295 271 360 58 250
85 8 269 292 197 283 211 288 254 130 276 101 203 96 153 45 195 55
78 76 163 186 226 82 284 314 240 327 176 113 294 235 109 343 213 350
194 202 126 297 192 209]
number_of_reviews = [ 6 169 42 30 1 113 8 67 89 23 61 24 18 4
162 12 84 11 0 63 53 129 56 22 15 39 76 5
38 45 103 54 66 37 122 85 65 86 9 829 21 43
126 47 34 74 101 115 185 79 188 10 2 29 87 7
3 20 27 14 75 211 532 148 91 605 40 136 13 331
32 35 73 238 16 170 51 44 152 688 516 100 120 52
112 50 133 137 64 55 60 26 116 33 19 49 219 199
59 41 425 110 128 470 445 127 82 172 68 70 125 194
81 151 173 222 613 269 95 248 80 90 593 244 431 105
182 17 106 559 78 98 227 327 93 57 28 376 36 25
71 180 191 161 393 108 243 131 175 489 203 228 119 533
384 252 141 121 48 209 96 268 285 159 31 812 158 58
88 69 265 97 149 46 83 305 279 92 72 147 592 118
62 163 367 167 171 183 155 111 258 251 77 525 379 145
482 232 311 213 292 201 504 304 314 349 340 179 234 380
332 166 160 146 102 140 202 193 259 192 456 132 286 436
181 382 368 178 139 174 464 164 507 226 154 94 218 190
316 427 107 282 177 348 403 449 187 198 245 109 215 143
255 329 135 104 271 229 189 354 246 224 323 184 372 247
334 208 168 365 144 197 99 157 299 242 411 230 280 270
210 424 250 124 176 418 114 404 573 261 260 256 196 257
223 335 447 214 117 150 277 457 297 123 134 337 322 267
317 212 221 142 156 306 231 676 399 138 339 130 296 439
333 235 220 387 338 266 324 233 350 352 715 308 274 254
524 1116 272 300 357 846 153 195 186 326 343 385 452 298
240 273 346 264 313 206 344 294 426 390 310 275 690 494
569 241 216 281 204 307 468 315 289 405 600 200 353 303
432 361 239 378 342 607 541 586 165 459 263 395 448 225
249 413 287 394 205 325 392 347 236 441 237 415 375 589
542 531 278 443 356 421 309 207 359 253 301 276]
number_of_reviews_ltm = [ 0 2 1 4 3 13 10 7 16 47 5 6 22 70 18 115 27 48
19 64 28 29 12 9 8 11 30 113 15 57 52 50 60 20 82 44
14 33 43 81 49 56 34 35 21 23 24 26 39 53 17 45 38 66
46 40 69 62 32 37 25 41 72 36 59 67 80 31 54 63 79 89
61 42 109 73 140 55 71 51 98 116 101 96 75 58 126 136 77 85
65 90 76 86 93 176 103 122 118 94 91 112 78 68 99 108 74 114
92 106 84 83 100 97 88 110 123 129 117 142 120]
number_of_reviews_l30d = [ 0 1 4 3 8 2 5 6 7 10 11 9 14 12 18 15 13 16]
first_review = ['7/19/2015' '8/20/2009' '1/5/2011' ... '8/21/2024' '9/5/2024' '8/14/2024']
last_review = ['8/7/2017' '8/27/2013' '9/1/2023' ... '12/14/2023' '1/30/2024'
'2/26/2024']
review_scores_rating = [5. 4.84 4.79 4.93 4.64 4.75 4.18 4.17 4.42 4.63 4.88 4.85 4.61 4.83
4.94 4.92 4.82 4.55 4.71 nan 4.7 4.95 4.76 4.69 4.8 4.12 4.97 4.5
4.62 4.89 4.21 4.52 4.74 4.77 3.8 4.66 4.87 4.34 4.38 4.86 4.67 4.81
4.22 4. 4.9 4.57 4.47 4.25 4.91 4.73 4.45 4.72 4.96 4.43 4.33 4.6
4.54 4.27 4.59 4.98 4.29 4.56 4.44 4.78 4.58 4.46 4.53 4.4 4.65 4.41
4.99 4.51 4.15 4.68 3. 4.06 4.39 4.48 4.49 4.2 3.2 4.07 3.5 4.31
3.75 4.14 4.26 4.19 4.16 4.37 4.36 2. 4.13 4.24 4.11 4.05 3.43 4.23
3.95 4.28 4.3 4.32 1. 4.1 3.91 4.08 4.35 3.6 3.67 3.38 3.71 2.67
3.78 3.33 3.25 2.5 2.6 4.04 3.96 2.33 3.82 3.93 3.4 3.88 3.83 4.02
3.94 3.89 3.9 3.47 3.7 3.86 3.62 2.75 3.44 4.09 1.5 3.79 3.63 2.43]
review_scores_accuracy = [5. 4.81 4.79 nan 4.65 4.88 4.51 4.49 4.3 4.8 4.96 4.69 4.72 4.75
4.93 4.67 4.7 4.45 4.78 4.95 4.92 4.82 4.36 4.83 4.87 4.97 4.33 4.55
4.89 4.85 4.6 4.71 4.52 4.57 4.91 4.5 4.9 4.56 4.77 4.61 4.84 4.94
4.86 4.98 4.62 4.14 4.68 4.48 4.25 4.44 4.73 4.58 4.99 4.59 4.74 4.76
4. 4.4 4.39 4.63 4.46 4.17 4.64 4.23 3.25 4.31 4.43 4.12 3. 4.42
4.54 4.66 3.8 3.75 4.53 0. 4.38 4.47 3.5 4.13 3.67 4.07 4.29 1.
4.11 4.22 4.37 4.41 4.28 3.57 3.95 4.19 4.34 4.06 4.2 3.88 4.21 3.6
3.56 3.71 4.09 2. 4.15 4.27 3.92 4.24 3.91 4.32 4.18 3.86 2.6 3.96
3.33 2.33 3.83 4.26 3.87 3.4 3.63 2.67 4.05 4.1 4.35 4.16 3.2 2.5
3.93 4.08 2.8 3.85 3.44 3.81 2.29 3.38 3.89]
review_scores_cleanliness = [5. 4.89 4.79 4.87 nan 4.67 4.38 4.03 3.97 3.95 4.44 4.69 4.28 4.5
4.91 4.58 4.86 4.36 4.13 4.88 4.84 4.98 4.96 4.48 4.54 4.76 4.6 4.26
4.09 4.64 4.72 4.77 4.78 4.71 4.11 4.62 4.9 4.4 4.22 4.45 4.92 4.61
4.68 3.7 4. 4.34 4.8 4.83 4.33 4.75 4.85 4.81 4.39 4.93 4.82 3.25
4.57 3.9 4.63 4.66 4.94 4.53 4.29 3.75 4.43 4.47 4.59 3.63 4.35 4.25
4.74 4.27 4.95 4.41 3.8 4.99 4.73 4.31 4.7 4.51 4.65 4.97 2. 3.
4.07 4.23 4.37 4.49 4.55 3.5 3.94 4.42 4.04 4.1 4.17 3.67 4.52 3.88
4.2 1. 4.56 4.24 3.86 4.15 3.93 4.14 4.46 2.2 4.3 3.77 4.06 1.5
0. 2.88 3.47 2.5 4.18 4.32 3.58 4.21 3.92 3.96 3.89 3.73 4.16 3.38
4.08 3.87 4.19 3.43 3.33 3.4 4.12 3.91 3.83 3.2 3.71 3.81 3.06 3.76
3.57 3.82 3.29 3.6 3.56 3.64 2.75 2.67 3.55 2.6 2.33 3.36 3.34 3.62
3.84 2.8 3.69 4.05 3.85 3.22 2.57 3.78]
review_scores_checkin = [5. 4.87 4.64 nan 4.95 4.88 4.79 4.63 4.5 4.8 4.94 4.89 4.92 4.98
4.83 4.99 4.45 4.75 4.78 4.9 4.69 4.84 4.86 4.93 4.97 4.59 4.77 4.76
4.91 4.67 4.96 4.81 4.6 4.85 4.36 4.44 4.82 4.29 4.33 4. 4.73 4.74
4.68 4.38 4.49 3. 4.62 4.53 4.71 4.72 4.61 3.5 4.56 4.7 3.25 4.14
4.25 4.65 4.43 2. 4.48 4.54 4.57 4.47 4.55 4.51 0. 4.39 4.66 1.
4.13 4.46 4.09 4.05 4.58 4.32 4.52 4.17 3.67 4.2 4.27 3.14 4.31 4.42
4.4 3.38 4.28 4.35 3.71 1.67 4.41 4.22 2.75 4.15 4.37 4.34 2.6 3.76
4.12 2.33 4.3 3.73 4.26 4.08 3.4 4.21 2.67 4.11 3.75 4.23 3.33 3.84
4.19 3.6 3.34 3.8 3.88 3.46 4.16 4.18 4.06 4.24 2.5 3.86 3.89 3.83
4.1 3.93 3.92 3.56 3.94 3.97 2.86]
review_scores_communication = [5. 4.9 4.76 nan 4.96 4.63 4.84 4.69 4.8 4.86 4.88 4.67 4.92 4.97
4.95 4.36 4.98 4.91 4.72 4.81 4.89 4.78 4.5 4.94 4.82 4.99 4.93 4.17
4.83 4.6 4.62 4.87 4.79 4.4 4.71 4.55 4.56 4.77 4.43 4.85 4.75 4.7
4.74 4.64 4.34 4.46 4. 4.68 4.73 3.5 3.75 4.14 4.59 4.44 2. 4.47
4.54 4.66 4.31 3.67 4.61 3. 4.29 4.33 4.42 4.52 4.65 4.38 4.58 4.51
4.45 4.25 4.53 4.37 1. 4.23 4.39 4.1 4.28 4.48 3.83 4.27 4.57 3.33
4.32 4.3 4.13 4.2 2.5 4.35 3.71 4.08 4.15 3.25 4.11 4.49 4.18 3.8
3.2 4.26 3.69 3.94 3.9 4.16 4.22 4.41 2.6 2.33 3.88 3.96 3.73 3.4
4.24 3.56 3.63 4.09 3.92 4.21 4.19 3.6 3.78 4.04 1.5 3.86 2.71]
review_scores_location = [5. 4.92 4.86 4.87 nan 4.58 4.88 4.95 4.85 4.75 4.94 4.98 4.54 4.82
4.93 4.76 4.62 4.97 4.81 4.14 4.56 4.8 4.53 4.65 4.67 4.4 4.79 4.55
4.41 4.9 4.64 4.68 4.89 4.17 4.78 4.5 4.73 4.11 4.33 4.74 4.96 4.2
4. 4.91 4.99 4.83 4.6 4.77 4.59 4.71 4.84 4.66 4.42 4.7 4.25 4.69
3.5 4.63 4.57 2.5 4.19 4.21 4.39 4.29 4.31 4.52 4.61 3. 4.43 4.44
4.46 4.72 4.36 3.67 3.75 4.3 4.45 4.47 4.49 4.16 4.08 1. 4.38 3.83
4.22 4.51 3.64 4.32 4.06 4.48 3.97 4.26 2. 4.28 4.18 3.47 4.07 3.85
3.7 4.15 4.23 4.34 4.24 4.27 4.37 4.35 2.6 3.33 3.4 3.56 4.1 4.03
3.9 3.38 3.2 3.57 4.13 3.89 1.5 3.63 3.6 3.82]
review_scores_value = [5. 4.83 4.67 4.87 nan 4.69 4.5 4.23 4.21 4.25 4.75 4.44 4.63 4.92
4.78 4.64 4.38 4.66 4.82 4.7 4.84 4.8 4.68 4.51 4.65 4.94 4.86 4.79
4.36 4.72 4.88 4.6 4.4 4.48 3.71 4.56 4.81 3.83 4.39 4.77 4.24 4.55
4.73 4.54 4.89 4.52 3.67 4.93 4.28 4.85 4. 4.47 4.91 4.33 4.45 4.46
4.96 4.62 4.59 4.74 4.53 4.26 4.06 4.57 4.35 4.95 4.9 4.71 4.76 4.08
4.34 4.2 3.5 4.58 4.97 4.61 4.32 4.17 4.43 4.42 2.75 4.29 3. 4.49
4.3 4.37 4.13 0. 4.07 4.19 4.14 4.41 4.98 3.63 4.12 4.22 1. 3.88
3.86 4.27 3.29 4.18 4.11 3.93 3.91 4.31 4.09 3.99 3.6 4.04 3.75 3.94
3.33 3.4 3.38 2.5 4.02 2. 3.25 3.56 3.85 3.92 4.05 3.8 2.6 4.1
2.33 3.79 3.78 4.99 3.81 3.9 3.89 4.16 3.96 3.7 4.15 2.67 3.82 3.2
3.77 3.44 1.5 3.22 2.43 3.57]
license = [nan 'STR-2009-FXRRPD' 'STR-2303-FPCPHQ' ... 'STR-2405-GRDKVT'
'STR-2305-HSTBHY' 'STR-2308-FKJVHP']
instant_bookable = ['f' 't']
calculated_host_listings_count = [ 1 2 5 4 9 6 3 18 7 101 33 10 16 8 12 15 21 11
17 36 22 24 54 14 13 34 20 62 28 30 25 32 47 19 37 23
51 95 46 92 27]
calculated_host_listings_count_entire_homes = [ 1 5 4 7 0 2 3 12 6 101 10 15 11 9 8 31 22 16
24 54 13 34 20 62 28 30 25 19 17 18 95]
calculated_host_listings_count_private_rooms = [ 0 1 2 4 5 6 21 10 3 9 7 16 8 14 15 11 12 17 39 23 20 13 37 51
29 92 18]
calculated_host_listings_count_shared_rooms = [0 1 2 6 4 3 5 7 8]
reviews_per_month = [5.000e-02 9.200e-01 2.500e-01 1.700e-01 1.000e-02 6.600e-01 6.000e-02
4.000e-01 5.300e-01 1.400e-01 3.600e-01 2.400e-01 1.100e-01 2.000e-02
1.230e+00 8.000e-02 5.200e-01 1.000e-01 nan 3.400e-01 8.200e-01
5.700e-01 9.000e-02 3.500e-01 4.900e-01 4.000e-02 2.800e-01 2.900e-01
7.000e-01 4.300e-01 8.000e-01 5.600e-01 4.400e-01 9.800e-01 7.000e-02
5.470e+00 3.200e-01 8.600e-01 2.300e-01 5.000e-01 6.900e-01 7.900e-01
1.240e+00 5.400e-01 1.200e-01 7.600e-01 1.270e+00 2.700e-01 2.000e-01
5.900e-01 2.100e-01 3.000e-02 1.800e-01 1.900e-01 9.300e-01 6.500e-01
1.490e+00 3.890e+00 1.180e+00 6.800e-01 3.300e-01 4.490e+00 1.500e-01
8.300e-01 9.700e-01 2.480e+00 4.800e-01 3.910e+00 2.600e-01 1.300e-01
8.400e-01 1.710e+00 1.600e-01 1.220e+00 3.700e-01 4.100e-01 1.110e+00
3.100e-01 4.980e+00 3.810e+00 3.900e-01 7.300e-01 1.060e+00 1.100e+00
9.400e-01 7.500e-01 5.500e-01 1.630e+00 2.330e+00 8.800e-01 3.180e+00
9.600e-01 3.510e+00 3.330e+00 4.600e-01 1.380e+00 1.460e+00 6.200e-01
1.770e+00 4.700e-01 1.750e+00 4.670e+00 2.660e+00 2.060e+00 7.200e-01
2.200e-01 2.520e+00 5.100e-01 7.100e-01 1.000e+00 5.220e+00 1.920e+00
4.500e-01 3.400e+00 2.310e+00 1.910e+00 4.470e+00 6.700e-01 1.810e+00
2.670e+00 1.360e+00 6.400e-01 1.020e+00 3.070e+00 1.430e+00 5.800e-01
1.780e+00 7.800e-01 2.020e+00 1.560e+00 1.310e+00 3.230e+00 3.800e-01
1.990e+00 8.700e-01 1.070e+00 1.640e+00 4.010e+00 1.670e+00 1.900e+00
1.050e+00 4.370e+00 1.590e+00 3.150e+00 2.500e+00 2.080e+00 1.160e+00
1.720e+00 1.030e+00 1.120e+00 4.200e-01 6.300e-01 2.230e+00 2.370e+00
1.330e+00 3.000e-01 1.260e+00 6.810e+00 8.100e-01 2.270e+00 1.280e+00
1.130e+00 2.590e+00 9.000e-01 1.250e+00 2.150e+00 5.090e+00 1.090e+00
6.100e-01 2.700e+00 1.940e+00 1.420e+00 1.080e+00 2.350e+00 1.170e+00
1.010e+00 2.200e+00 7.420e+00 4.650e+00 1.370e+00 1.290e+00 4.190e+00
2.720e+00 2.900e+00 2.630e+00 2.960e+00 1.300e+00 4.450e+00 2.770e+00
4.680e+00 3.410e+00 3.010e+00 1.140e+00 1.600e+00 3.370e+00 5.440e+00
1.450e+00 3.550e+00 2.950e+00 1.680e+00 1.470e+00 1.510e+00 9.100e-01
2.690e+00 4.410e+00 1.740e+00 4.080e+00 1.190e+00 1.040e+00 2.560e+00
6.000e-01 2.840e+00 1.650e+00 1.530e+00 1.620e+00 3.450e+00 3.300e+00
3.190e+00 1.520e+00 2.540e+00 4.180e+00 1.480e+00 4.570e+00 2.050e+00
1.150e+00 1.390e+00 1.320e+00 1.550e+00 1.660e+00 2.010e+00 1.820e+00
8.500e-01 4.020e+00 7.400e-01 1.410e+00 2.580e+00 9.500e-01 2.190e+00
3.220e+00 3.730e+00 1.970e+00 4.170e+00 3.080e+00 1.570e+00 1.350e+00
1.850e+00 9.900e-01 1.730e+00 2.470e+00 1.790e+00 7.700e-01 2.320e+00
8.900e-01 1.200e+00 2.730e+00 2.420e+00 2.600e+00 1.580e+00 3.530e+00
2.440e+00 2.170e+00 3.120e+00 3.610e+00 2.410e+00 2.290e+00 3.270e+00
1.930e+00 2.000e+00 2.300e+00 3.680e+00 1.610e+00 2.620e+00 1.950e+00
2.460e+00 2.970e+00 2.400e+00 4.130e+00 2.140e+00 1.830e+00 3.920e+00
1.890e+00 2.130e+00 2.250e+00 3.940e+00 4.600e+00 4.380e+00 2.510e+00
2.070e+00 1.880e+00 6.420e+00 1.870e+00 1.440e+00 4.100e+00 2.390e+00
5.820e+00 2.650e+00 2.530e+00 2.740e+00 3.170e+00 2.030e+00 3.440e+00
4.580e+00 2.240e+00 1.700e+00 2.340e+00 1.540e+00 3.110e+00 4.700e+00
3.050e+00 1.690e+00 3.480e+00 1.840e+00 2.430e+00 2.090e+00 3.280e+00
3.390e+00 2.040e+00 1.400e+00 5.240e+00 2.210e+00 1.760e+00 3.200e+00
7.070e+00 6.580e+00 3.560e+00 1.800e+00 3.670e+00 1.500e+00 2.760e+00
3.760e+00 3.160e+00 2.360e+00 4.740e+00 5.620e+00 2.820e+00 2.100e+00
2.380e+00 4.220e+00 1.960e+00 2.160e+00 2.910e+00 3.700e+00 2.570e+00
4.030e+00 2.550e+00 3.850e+00 7.900e+00 3.030e+00 5.800e+00 1.245e+01
3.060e+00 1.210e+00 3.950e+00 3.350e+00 3.990e+00 2.180e+00 4.270e+00
1.980e+00 4.330e+00 2.260e+00 9.590e+00 3.250e+00 2.120e+00 3.690e+00
4.940e+00 3.840e+00 3.520e+00 3.980e+00 1.860e+00 2.780e+00 2.490e+00
4.060e+00 3.660e+00 2.880e+00 5.180e+00 3.470e+00 4.720e+00 5.060e+00
1.340e+00 2.930e+00 8.380e+00 6.400e+00 6.910e+00 3.420e+00 3.380e+00
9.640e+00 5.890e+00 4.250e+00 3.970e+00 5.100e+00 3.130e+00 7.660e+00
2.640e+00 4.560e+00 5.550e+00 2.990e+00 2.450e+00 3.490e+00 3.630e+00
4.750e+00 3.720e+00 3.600e+00 5.050e+00 8.470e+00 2.710e+00 4.200e+00
7.410e+00 8.340e+00 2.790e+00 4.140e+00 5.530e+00 6.240e+00 3.430e+00
3.460e+00 5.410e+00 4.770e+00 2.830e+00 4.320e+00 2.680e+00 3.290e+00
2.110e+00 2.890e+00 4.970e+00 3.140e+00 3.740e+00 6.250e+00 4.390e+00
6.050e+00 4.610e+00 2.280e+00 2.220e+00 4.160e+00 5.120e+00 4.550e+00
6.130e+00 4.880e+00 2.920e+00 3.040e+00 6.980e+00 5.040e+00 4.820e+00
4.350e+00 3.800e+00 2.610e+00 7.060e+00 3.790e+00 6.700e+00 2.850e+00
7.430e+00 6.460e+00 9.670e+00 9.020e+00 8.730e+00 1.176e+01 3.930e+00
3.780e+00 4.420e+00 5.880e+00 6.340e+00 2.810e+00 6.750e+00 5.280e+00
3.540e+00 3.880e+00 4.850e+00 3.090e+00 9.100e+00 3.360e+00 7.550e+00
8.100e+00 5.570e+00 5.190e+00 6.820e+00 8.080e+00 6.920e+00 4.900e+00
5.140e+00 3.570e+00 6.890e+00 6.320e+00 6.290e+00 4.910e+00 3.340e+00
6.090e+00 4.050e+00 4.710e+00 8.660e+00 6.360e+00 4.430e+00 5.450e+00
3.960e+00 4.780e+00 5.510e+00 4.500e+00 3.310e+00 5.230e+00 7.540e+00
2.940e+00 4.640e+00 6.030e+00 5.600e+00 4.120e+00 2.750e+00 7.960e+00
9.040e+00 4.460e+00 7.760e+00 5.590e+00 5.250e+00 5.970e+00 2.870e+00
5.870e+00 3.260e+00 2.860e+00 3.000e+00 6.150e+00 7.000e+00 3.210e+00
3.580e+00 7.580e+00 3.640e+00 4.090e+00 7.270e+00 3.900e+00 4.070e+00
4.260e+00 8.780e+00 7.670e+00 4.730e+00 7.870e+00 4.520e+00 3.620e+00
4.040e+00 6.470e+00 4.540e+00 4.230e+00 5.400e+00 6.960e+00 2.980e+00
5.650e+00 5.200e+00 6.140e+00 5.940e+00 3.710e+00 4.150e+00 4.290e+00
5.030e+00 7.400e+00 9.130e+00 6.000e+00 3.240e+00 5.700e+00 7.010e+00
6.620e+00 6.930e+00 3.320e+00 1.033e+01 7.100e+00 7.260e+00 8.930e+00
3.820e+00 3.860e+00 4.870e+00 5.310e+00 7.330e+00 3.100e+00 4.400e+00
6.180e+00 8.140e+00 4.300e+00 5.750e+00 2.800e+00 4.510e+00 4.210e+00
5.480e+00 6.280e+00 5.170e+00 4.340e+00 4.590e+00 4.110e+00 5.160e+00
5.000e+00 6.410e+00 6.630e+00 1.071e+01 6.720e+00 3.020e+00 6.270e+00
5.020e+00 7.240e+00 5.080e+00 4.890e+00 8.710e+00 8.510e+00 9.970e+00
7.700e+00 3.750e+00 3.590e+00 6.310e+00 4.360e+00 3.870e+00 5.320e+00
4.000e+00 5.390e+00 9.510e+00 5.460e+00 6.260e+00 5.930e+00 3.500e+00
8.740e+00 1.077e+01 5.690e+00 9.650e+00 8.180e+00 5.980e+00 4.830e+00
6.740e+00 6.190e+00 4.480e+00 3.650e+00 4.800e+00 5.500e+00 4.240e+00
6.660e+00 8.520e+00 5.300e+00 5.810e+00 5.540e+00 6.080e+00 6.690e+00
4.930e+00 4.690e+00 5.840e+00 4.660e+00 5.150e+00 6.640e+00 8.540e+00
5.520e+00 6.800e+00 4.630e+00 6.590e+00 5.670e+00 3.830e+00 5.290e+00
5.210e+00 5.260e+00 1.030e+01 5.430e+00 6.100e+00 8.150e+00 8.000e+00
6.860e+00 4.530e+00 1.080e+01 6.060e+00 9.320e+00 8.460e+00 7.160e+00
7.440e+00 9.230e+00 6.010e+00 1.127e+01 9.290e+00 5.270e+00 5.380e+00
5.920e+00 8.090e+00 6.680e+00 6.170e+00 1.061e+01 5.370e+00 5.010e+00
7.340e+00 6.730e+00 5.340e+00 6.160e+00 6.610e+00 5.640e+00 5.360e+00
5.790e+00 5.110e+00 5.730e+00 5.580e+00 5.490e+00 6.040e+00 7.720e+00
4.760e+00 5.770e+00 4.310e+00 7.170e+00 6.790e+00 6.490e+00 5.720e+00
5.130e+00 4.790e+00 4.620e+00 6.780e+00 5.330e+00 6.210e+00 6.070e+00
5.420e+00 1.088e+01 5.900e+00 5.860e+00 5.710e+00 4.920e+00 5.070e+00
5.560e+00 4.840e+00 9.630e+00 7.280e+00 9.300e+00 7.210e+00 8.210e+00
8.600e+00 6.370e+00 8.980e+00 7.740e+00 6.380e+00 4.440e+00 6.520e+00
4.860e+00 1.125e+01 8.020e+00 5.830e+00 7.180e+00 3.770e+00 5.950e+00
8.950e+00 8.770e+00 1.000e+01 7.850e+00 7.370e+00 5.740e+00 8.670e+00
8.350e+00 7.220e+00 5.680e+00 6.390e+00 9.500e+00 5.760e+00 9.310e+00
6.450e+00 9.080e+00 8.320e+00 6.600e+00 6.200e+00 1.135e+01 7.020e+00
8.640e+00 7.770e+00 1.106e+01 7.500e+00 5.630e+00 7.750e+00 6.990e+00
1.011e+01 6.840e+00 9.730e+00 7.320e+00 7.560e+00 9.780e+00 6.770e+00
9.820e+00 8.840e+00 9.360e+00 9.750e+00 8.260e+00 7.920e+00 6.300e+00
4.950e+00 9.270e+00 8.570e+00 6.650e+00 8.610e+00 8.830e+00 1.233e+01
1.057e+01 8.280e+00 1.027e+01 6.120e+00 6.530e+00 7.890e+00 1.133e+01
7.310e+00 6.500e+00 9.250e+00 5.910e+00 9.410e+00 8.480e+00 8.860e+00
6.550e+00 7.940e+00 9.190e+00 1.009e+01 1.120e+01 6.850e+00 5.850e+00
7.570e+00 6.510e+00 5.660e+00 7.090e+00 9.000e+00 1.018e+01 8.620e+00
9.620e+00 8.450e+00 8.440e+00 7.350e+00 1.209e+01 1.054e+01 1.078e+01
6.430e+00 7.140e+00 9.610e+00 1.219e+01 7.860e+00 1.459e+01 9.890e+00
1.132e+01 8.400e+00 1.037e+01 1.031e+01 9.110e+00 9.560e+00 1.063e+01
7.110e+00 8.130e+00 1.193e+01 8.110e+00 1.100e+01 1.111e+01 7.200e+00
7.590e+00 7.230e+00 7.780e+00 7.950e+00 8.360e+00 6.670e+00 6.540e+00
6.560e+00 1.119e+01 1.019e+01 8.870e+00 1.167e+01 9.850e+00 1.238e+01
7.380e+00 1.095e+01 8.330e+00 9.470e+00 7.290e+00 7.830e+00 9.340e+00
1.091e+01 9.150e+00 1.103e+01 8.490e+00 1.548e+01 9.070e+00 7.360e+00
1.286e+01 6.110e+00 8.050e+00 9.060e+00 8.680e+00 1.149e+01 7.690e+00
1.102e+01 9.770e+00 9.380e+00 1.043e+01 1.098e+01 8.250e+00 1.200e+01
7.300e+00 1.400e+01 8.820e+00]
Cleaning Data¶
host_since:This feature is sharing the information of the host that how old the host is in the business. It is a date and for our model we do not need date but the feature might hold important correlation so we will convert it to number of months.
# Converting the dates to datetime object
listings['host_since'] = pd.to_datetime(listings['host_since'])
# Fixing the current date (we can change it but for now we will fix the date - `20 november 2024`)
today_date = datetime.datetime.today()
# Looping to replace the months difference by the date comparing to today
for i in range(len(listings)):
since_date = listings['host_since'][i]
# Calculating the months difference
months_diff = (today_date.year - since_date.year) * 12 + (today_date.month - since_date.month)
listings['host_since'][i] = months_diff
listings['host_since'] = listings['host_since'].astype(float)
host_responce_time:This feature looks good as we have limited categorical values which we can just convert to number for further process.
listings['host_response_time'].replace({'within a few hours': 2, 'within an hour': 3,
'within a day': 1, 'a few days or more': 0},
inplace = True)
listings['host_response_time'].unique()
array([nan, 2., 3., 1., 0.])
host_response_rate: Response rate of the host. It matches with the above feature response time but before taking any step we will move forward with transforming of the feature.
for i in range(len(listings)):
try:
listings['host_response_rate'][i] = int(listings['host_response_rate'][i].replace('%', ''))
except AttributeError:
continue
listings['host_response_rate'] = listings['host_response_rate'].astype(float)
listings['host_response_rate'].unique()
array([ nan, 100., 77., 50., 88., 80., 0., 97., 33., 90., 86.,
94., 96., 75., 67., 91., 98., 69., 60., 40., 92., 95.,
25., 70., 20., 30., 76., 83., 89., 78., 93., 99., 79.,
71., 85., 65., 10., 73., 8., 63., 82., 57., 13., 14.,
17., 45., 6., 74., 47., 87., 9., 26., 81., 55., 62.,
27., 58., 84., 22., 46., 64., 29.])
host_acceptance_rate: Similar like response rate, this feture tells us the rate of the acceptance.
for i in range(len(listings)):
try:
listings['host_acceptance_rate'][i] = int(listings['host_acceptance_rate'][i].replace('%', ''))
except AttributeError:
continue
listings['host_acceptance_rate'] = listings['host_acceptance_rate'].astype(float)
listings['host_acceptance_rate'].unique()
array([ nan, 38., 100., 60., 62., 94., 89., 50., 0., 96., 86.,
83., 46., 42., 75., 95., 92., 80., 67., 82., 40., 98.,
97., 71., 87., 73., 69., 78., 93., 61., 76., 91., 37.,
90., 88., 66., 84., 99., 65., 74., 33., 17., 77., 85.,
79., 56., 70., 59., 31., 68., 14., 63., 20., 25., 28.,
48., 81., 43., 29., 64., 51., 53., 22., 49., 44., 15.,
30., 27., 24., 39., 58., 35., 21., 72., 57., 55., 36.,
11., 34., 47., 18., 52., 8., 5., 13., 54., 41., 23.,
12., 26., 45., 9., 32., 16., 10., 2., 7.])
host_is_superhost: This feature hold good values but in stirng, it should be boolean (true & false). we can easily transform it.
listings['host_is_superhost'].replace({'f': 0, 't': 1}, inplace = True)
listings['host_is_superhost'].unique()
array([ 0., 1., nan])
host_verifications: This feature also having limited categorical values so we will move forward with transforming it
listings['host_verifications'].replace({
"['email', 'phone', 'work_email']": 7,
"['email', 'phone']": 6,
"['phone', 'work_email']": 5,
"['email', 'work_email']": 4,
"['phone']": 3,
"['work_email']": 2,
"['email']": 1,
'[]': 0
}, inplace = True)
(listings['host_verifications'].unique())
array([ 6., 7., 3., 5., 1., 2., 0., 4., nan])
host_has_profile_pic:
listings['host_has_profile_pic'].replace({'f': 0, 't': 1}, inplace = True)
listings['host_has_profile_pic'].unique()
array([ 1., 0., nan])
host_identity_verified:
listings['host_identity_verified'].replace({'f': 0, 't': 1}, inplace = True)
listings['host_identity_verified'].unique()
array([ 1., 0., nan])
room_type:
listings['room_type'].replace({'Entire home/apt': 2, 'Private room': 1, 'Shared room': 0}, inplace = True)
listings['room_type'].unique()
array([2, 1, 0], dtype=int64)
bathrooms & bathrooms_text: These two features are representing same insight so we can combine both or remove one. But, we cannot remove them as bathroom feature has a lot of nan values whereas bathrooms_text feature does contain the values. As we do not need the text feature so we can use it to fill bathrooms feature and remove bathrooms_text after that.
# Filling bathrooms feature from bathroom_text where empty/null
for i in range(len(listings)):
if pd.isna(listings['bathrooms'][i]):
if pd.isna(listings['bathrooms_text'][i]):
listings['bathrooms'][i] = None
else:
try:
listings['bathrooms'][i] = float(listings['bathrooms_text'][i].split()[0])
except ValueError as ve:
listings['bathrooms'][i] = None
listings.drop(columns = ['bathrooms_text'], inplace = True)
listings['bathrooms'].unique()
array([3. , 1.5, 1. , 0.5, 2. , 0. , 2.5, 4. , 5. , 3.5, 4.5, nan, 5.5,
6.5, 6. , 8. ])
amenities: This feature hold the list of amenities available in particular bnb. as for now we will transofrm this feature with the count of the amenities in each listing and move forward.
# Transforming amenities feature with number of amenities available for sumaarized modeling
amenities_list = list(df['amenities'].apply(ast.literal_eval))
for i in range(len(listings)):
listings['amenities'][i] = len(amenities_list[i])
listings['amenities'] = listings['amenities'].astype(float)
listings['amenities'].unique()
array([ 13., 10., 43., 47., 34., 57., 27., 37., 25., 32., 26.,
9., 17., 11., 64., 31., 53., 12., 50., 28., 38., 62.,
69., 35., 7., 63., 61., 59., 44., 52., 73., 22., 16.,
65., 36., 41., 58., 18., 29., 30., 48., 24., 20., 42.,
15., 19., 8., 21., 49., 0., 86., 51., 5., 6., 33.,
55., 45., 56., 54., 40., 39., 46., 60., 68., 23., 14.,
74., 66., 2., 4., 83., 71., 72., 3., 79., 77., 76.,
70., 99., 75., 1., 67., 87., 85., 81., 78., 80., 82.,
91., 90., 104., 93., 103., 84., 94.])
price: This feature is our target feature as well. It contains a lot of different characters which are not needed so we will use the same technique to trasform it but with regular expression as added step for easy transformation to cover all the different special characters.
listings['price'] = listings['price'].replace({'\$': '', ',': '', ' ': ''}, regex = True).apply(pd.to_numeric, errors = 'coerce')
listings['price'].head(10)
0 NaN 1 NaN 2 172.0 3 75.0 4 NaN 5 NaN 6 79.0 7 126.0 8 148.0 9 90.0 Name: price, dtype: float64
has_availability:
listings['has_availability'].replace({'f': 0, 't': 1}, inplace = True)
listings['has_availability'].unique()
array([ 1., nan, 0.])
first_review & last_review: These two features are dates and we're not sure as for now if needed and if yes then what transformation we should follow as we are not working with time series so having a date feature won't make much sense. If we think these are needed we'll add them again and run the analysis again.
listings.drop(columns = ['first_review', 'last_review'], inplace = True)
license: Although, the license denotes a specific type of the home. But as for now we will remove it and will see if needed we will include it in our analysis further again.
listings.drop(columns = ['license'], inplace = True)
instant_bookable:
listings['instant_bookable'].replace({'f': 0, 't': 1}, inplace = True)
listings['instant_bookable'].unique()
array([0, 1], dtype=int64)
Now, we are done with the basic transformation which we can run amd will move forward with further analysis (Categorical Encoding)
Encoding Data¶
# Features still with type object as they needs categorical encoding to move forward
# Categorical Features
cat_cols = listings.select_dtypes(include = ['object'])
# Creating an object of Label Encoder
label_encoder = LabelEncoder()
# Looping in order to encode all the categorical features
for col in cat_cols:
listings[col] = label_encoder.fit_transform(listings[col].astype(str))
listings.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 21825 entries, 0 to 21824 Data columns (total 51 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 host_since 21823 non-null float64 1 host_response_time 15741 non-null float64 2 host_response_rate 15741 non-null float64 3 host_acceptance_rate 16297 non-null float64 4 host_is_superhost 20914 non-null float64 5 host_neighbourhood 21825 non-null int32 6 host_listings_count 21823 non-null float64 7 host_total_listings_count 21823 non-null float64 8 host_verifications 21823 non-null float64 9 host_has_profile_pic 21823 non-null float64 10 host_identity_verified 21823 non-null float64 11 neighbourhood_cleansed 21825 non-null int32 12 latitude 21825 non-null float64 13 longitude 21825 non-null float64 14 property_type 21825 non-null int32 15 room_type 21825 non-null int64 16 accommodates 21825 non-null int64 17 bathrooms 21800 non-null float64 18 bedrooms 20185 non-null float64 19 beds 16519 non-null float64 20 amenities 21825 non-null float64 21 price 16536 non-null float64 22 minimum_nights 21825 non-null int64 23 maximum_nights 21825 non-null int64 24 minimum_minimum_nights 21825 non-null int64 25 maximum_minimum_nights 21825 non-null int64 26 minimum_maximum_nights 21825 non-null int64 27 maximum_maximum_nights 21825 non-null int64 28 minimum_nights_avg_ntm 21825 non-null float64 29 maximum_nights_avg_ntm 21825 non-null float64 30 has_availability 20774 non-null float64 31 availability_30 21825 non-null int64 32 availability_60 21825 non-null int64 33 availability_90 21825 non-null int64 34 availability_365 21825 non-null int64 35 number_of_reviews 21825 non-null int64 36 number_of_reviews_ltm 21825 non-null int64 37 number_of_reviews_l30d 21825 non-null int64 38 review_scores_rating 16610 non-null float64 39 review_scores_accuracy 16608 non-null float64 40 review_scores_cleanliness 16608 non-null float64 41 review_scores_checkin 16608 non-null float64 42 review_scores_communication 16608 non-null float64 43 review_scores_location 16607 non-null float64 44 review_scores_value 16608 non-null float64 45 instant_bookable 21825 non-null int64 46 calculated_host_listings_count 21825 non-null int64 47 calculated_host_listings_count_entire_homes 21825 non-null int64 48 calculated_host_listings_count_private_rooms 21825 non-null int64 49 calculated_host_listings_count_shared_rooms 21825 non-null int64 50 reviews_per_month 16610 non-null float64 dtypes: float64(28), int32(3), int64(20) memory usage: 8.2 MB
As we can see, our data has no object type data now. Let's move forward with next steps now. We will clean the data. (dealing with missing values)
transformed_data = listings.copy()
Distribution Overview¶
Dealing with Missing Values¶
# Getting all the features which hold missing values
missing_features = transformed_data.columns[transformed_data.isnull().any()]
missing_features
Index(['host_since', 'host_response_time', 'host_response_rate',
'host_acceptance_rate', 'host_is_superhost', 'host_listings_count',
'host_total_listings_count', 'host_verifications',
'host_has_profile_pic', 'host_identity_verified', 'bathrooms',
'bedrooms', 'beds', 'price', 'has_availability', 'review_scores_rating',
'review_scores_accuracy', 'review_scores_cleanliness',
'review_scores_checkin', 'review_scores_communication',
'review_scores_location', 'review_scores_value', 'reviews_per_month'],
dtype='object')
let's check the distribution of all the features having missing values in order to decide what values we should fill.
# Number of plots
n_plots = len(missing_features)
# Calculating the number of rows needed for the subplots
n_rows = (n_plots // 3) + (n_plots % 3 > 0)
# Setting the subplots
fig, axes = plt.subplots(n_rows, 3, figsize = (20, n_rows * 5))
# Flattening the axes array for easier iteration
axes = axes.flatten()
# Looping through the missing features and creating a distribution plot for all the features
for i, feature in enumerate(missing_features):
# Location of the subplot
ax = axes[i]
# Visualization - Histogram
sns.histplot(transformed_data[feature].dropna(), kde = True, ax = ax, color = 'skyblue', bins = 30)
# Getting the value of mean & median
mean_val = transformed_data[feature].mean()
median_val = transformed_data[feature].median()
# Adding mean & median vertical lines
ax.axvline(mean_val, color = 'red', linestyle = '--', label = f'Mean: {mean_val:.2f}')
ax.axvline(median_val, color = 'green', linestyle = '--', label = f'Median: {median_val:.2f}')
# Labeling
ax.set_title(f'Distribution of {feature}')
ax.set_xlabel(feature)
ax.set_ylabel('Frequency')
# Legend
ax.legend()
# Removing any unused subplot at the end
for i in range(n_plots, len(axes)):
fig.delaxes(axes[i])
# Adjusting Layout for better spacing
plt.tight_layout()
# Showing
plt.show()
The above visuaklization is to just get an overview of the features' distribution. below we will make a function which can be called to get the visualization anywhere in the code ahead:
Visualization Function¶
def visualize_feature(feature, data):
# Setting the figure size
plt.figure(figsize = (8, 4))
# Visualization - Histogram
sns.histplot(data[feature].dropna(), kde = True, color = 'skyblue', bins = 30)
# Getting the value of mean & median
mean_val = data[feature].mean()
median_val = data[feature].median()
# Adding mean & median vertical lines
plt.axvline(mean_val, color = 'red', linestyle = '--', label = f'Mean: {mean_val:.2f}')
plt.axvline(median_val, color = 'green', linestyle = '--', label = f'Median: {median_val:.2f}')
# Labeling
plt.title(f'Distribution of {feature}')
plt.xlabel(feature)
plt.ylabel('Frequency')
# Legend
plt.legend()
# Adjusting Layout for better spacing
plt.tight_layout()
# Showing
plt.show()
Before moving forward, we will fill in the missing values of our Target variable price.
Imputing Target Variable¶
Price: We found that, in our other dataset calendar.csv, it contains price by the date of all the listings. So, we will fill in the nan values of our target variable price with those values.
# Reading calendar data
calendar = pd.read_csv('./Data-AirBNB/calendar.csv')
# Converting the feature 'date' to datetime for filtering ahead
calendar['date'] = pd.to_datetime(calendar['date'])
# Filtering the data to get the latest records only
calendar = calendar.sort_values(by = ['listing_id', 'date'], ascending = [True, False])
# Now, we will remove the duplicate rows as it will keep the first row and will delter futher repitions
calendar = calendar.drop_duplicates(subset = 'listing_id', keep = 'first')
# Cleaning the price values
calendar['price'] = calendar['price'].replace({'\$': '', ',': '', ' ': ''}, regex = True).apply(pd.to_numeric, errors = 'coerce')
calendar.head(5)
| listing_id | date | available | price | adjusted_price | minimum_nights | maximum_nights | |
|---|---|---|---|---|---|---|---|
| 364 | 1419 | 2025-09-05 | f | 469.0 | NaN | 28.0 | 730.0 |
| 729 | 8077 | 2025-09-05 | f | 75.0 | NaN | 180.0 | 365.0 |
| 1094 | 26654 | 2025-09-05 | t | 155.0 | NaN | 28.0 | 1125.0 |
| 2338 | 27423 | 2025-09-05 | f | 75.0 | NaN | 90.0 | 365.0 |
| 3582 | 30931 | 2025-09-05 | f | 100.0 | NaN | 180.0 | 365.0 |
# Getting the IDs back from the main source
transformed_data['listing_id'] = df['id']
# Looping to get all the prices
for i in range(len(transformed_data)):
if pd.isna(transformed_data['price'][i]):
transformed_data['price'][i] = calendar[calendar['listing_id'] == transformed_data['listing_id'][i]].reset_index(drop = True)['price'][0]
# Dropping the id feature again as not needed anymore
transformed_data = transformed_data.drop(columns = ['listing_id']).reset_index(drop = True)
transformed_data['price']
0 469.0
1 75.0
2 172.0
3 75.0
4 100.0
...
21820 350.0
21821 89.0
21822 170.0
21823 150.0
21824 245.0
Name: price, Length: 21825, dtype: float64
Let's create a function which is a model (random forest) which we will run everytime we are fixing any feature (imputing, deleting, etc) and will keep cross-checking everytie if the model is improving or not.
Model Check Function¶
def model_check(data):
model_data = data.dropna()
# Separating features (X) and target (y)
X = model_data.drop('price', axis = 1)
y = model_data['price']
# Splitting the data (test & train)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)
# Creating random forest object
rf_model = RandomForestRegressor(n_estimators = 200, random_state = 42, n_jobs = -1, oob_score = False, max_features = 11, bootstrap = False)
# Fitting/Training the model
rf_model.fit(X_train, y_train)
# Predicting
y_pred_rf = rf_model.predict(X_test)
# Metrics
mse_rf = mean_squared_error(y_test, y_pred_rf)
mae_rf = mean_absolute_error(y_test, y_pred_rf)
rmse_rf = np.sqrt(mse_rf)
r2_rf = r2_score(y_test, y_pred_rf)
return f"MSE: {mse_rf}, MAE: {mae_rf}, RMSE: {rmse_rf}, R2: {r2_rf}"
Let's also create an imputation function which can be called easily ahead after every feature to check the model:
Imputation Function¶
def impute_feature(feature, data):
# Copying the data into 3 DFs
df_mean_imputed = data.copy()
df_median_imputed = data.copy()
df_multiple_imputed = data.copy()
# Mean Imputation
mean_value = df_mean_imputed[feature].mean()
df_mean_imputed[feature] = df_mean_imputed[feature].fillna(mean_value)
# Median Imputation
median_value = df_median_imputed[feature].median()
df_median_imputed[feature] = df_median_imputed[feature].fillna(median_value)
# Multiple imputation using IterativeImputer
imputer = IterativeImputer()
df_multiple_imputed[feature] = imputer.fit_transform(
df_multiple_imputed[[feature]]
)
# Removing the feature
feature_drop = data.drop(columns = feature)
# Dropping the rows for the feature
rows_drop = data.dropna(subset = [feature]).reset_index(drop = True)
# Returning the DFs
return df_mean_imputed, df_median_imputed, df_multiple_imputed, feature_drop, rows_drop
Now that price is imputed correctly and model check function has been created, let's move forward and analyze other features for imputation. First we will check the null information:-
# Checking null values
transformed_data.isnull().sum()
host_since 2 host_response_time 6084 host_response_rate 6084 host_acceptance_rate 5528 host_is_superhost 911 host_neighbourhood 0 host_listings_count 2 host_total_listings_count 2 host_verifications 2 host_has_profile_pic 2 host_identity_verified 2 neighbourhood_cleansed 0 latitude 0 longitude 0 property_type 0 room_type 0 accommodates 0 bathrooms 25 bedrooms 1640 beds 5306 amenities 0 price 0 minimum_nights 0 maximum_nights 0 minimum_minimum_nights 0 maximum_minimum_nights 0 minimum_maximum_nights 0 maximum_maximum_nights 0 minimum_nights_avg_ntm 0 maximum_nights_avg_ntm 0 has_availability 1051 availability_30 0 availability_60 0 availability_90 0 availability_365 0 number_of_reviews 0 number_of_reviews_ltm 0 number_of_reviews_l30d 0 review_scores_rating 5215 review_scores_accuracy 5217 review_scores_cleanliness 5217 review_scores_checkin 5217 review_scores_communication 5217 review_scores_location 5218 review_scores_value 5217 instant_bookable 0 calculated_host_listings_count 0 calculated_host_listings_count_entire_homes 0 calculated_host_listings_count_private_rooms 0 calculated_host_listings_count_shared_rooms 0 reviews_per_month 5215 dtype: int64
# Getting the percentage of null values
transformed_data.apply(lambda x: f"{round((x.isnull().sum() / len(transformed_data)) * 100, 2)} %")
host_since 0.01 % host_response_time 27.88 % host_response_rate 27.88 % host_acceptance_rate 25.33 % host_is_superhost 4.17 % host_neighbourhood 0.0 % host_listings_count 0.01 % host_total_listings_count 0.01 % host_verifications 0.01 % host_has_profile_pic 0.01 % host_identity_verified 0.01 % neighbourhood_cleansed 0.0 % latitude 0.0 % longitude 0.0 % property_type 0.0 % room_type 0.0 % accommodates 0.0 % bathrooms 0.11 % bedrooms 7.51 % beds 24.31 % amenities 0.0 % price 0.0 % minimum_nights 0.0 % maximum_nights 0.0 % minimum_minimum_nights 0.0 % maximum_minimum_nights 0.0 % minimum_maximum_nights 0.0 % maximum_maximum_nights 0.0 % minimum_nights_avg_ntm 0.0 % maximum_nights_avg_ntm 0.0 % has_availability 4.82 % availability_30 0.0 % availability_60 0.0 % availability_90 0.0 % availability_365 0.0 % number_of_reviews 0.0 % number_of_reviews_ltm 0.0 % number_of_reviews_l30d 0.0 % review_scores_rating 23.89 % review_scores_accuracy 23.9 % review_scores_cleanliness 23.9 % review_scores_checkin 23.9 % review_scores_communication 23.9 % review_scores_location 23.91 % review_scores_value 23.9 % instant_bookable 0.0 % calculated_host_listings_count 0.0 % calculated_host_listings_count_entire_homes 0.0 % calculated_host_listings_count_private_rooms 0.0 % calculated_host_listings_count_shared_rooms 0.0 % reviews_per_month 23.89 % dtype: object
Transforming Data¶
Imputing & Dropping¶
Firstly, we will start with the features which carries less than 5% missing values and will check and decide on the go (impute or remove)
host_sincehost_is_superhosthost_listings_counthost_total_listings_counthost_verificationshost_has_profile_pichost_identity_verifiedbathroomsbedroomshas_availability
# Making another variable for cleaning data
clean_data = transformed_data.copy()
host_since
visualize_feature('host_since', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_since', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Median:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Multiple:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Feature Drop:- MSE: 10504.57746358039, MAE: 53.746289498141266, RMSE: 102.4918409610267, R2: 0.6293463782730535 Rows Drop:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
We imputed the missing values in the feature host_since with 3 methods:- mean, median and multiple and then sent the data to the model just to check the effect of the imputation on the model and we got same R^2 = 64. So, we can impute the value with any method we want. And further, we are going to perform ame steps for all the features and check for the change in model.
clean_data = mult_imp.copy()
host_is_superhost
visualize_feature('host_is_superhost', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_is_superhost', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 12539.781172376543, MAE: 51.86006613756614, RMSE: 111.98116436426504, R2: 0.6073229930286392 Median:- MSE: 12892.292865112433, MAE: 51.24221119929454, RMSE: 113.5442330773009, R2: 0.5962842647986072 Multiple:- MSE: 12539.781172376543, MAE: 51.86006613756614, RMSE: 111.98116436426504, R2: 0.6073229930286392 Feature Drop:- MSE: 13048.224654673722, MAE: 51.895185185185184, RMSE: 114.22882584826705, R2: 0.5914013384081931 Rows Drop:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
clean_data = rows_drop.copy()
As we can see the model's accuracy decreased. Even if we will remove the feature, it will just cause more reduction in the r2 value. So, for now looking at the model_check function's output, we dropped rows for this feature.
Let's move ahead and follow the same rules for all the features¶
host_listings_count
visualize_feature('host_listings_count', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_listings_count', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Median:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Multiple:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Feature Drop:- MSE: 10377.001284293681, MAE: 53.28509293680297, RMSE: 101.86756738183985, R2: 0.6338478989732101 Rows Drop:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
clean_data = mult_imp.copy()
host_total_listings_count
visualize_feature('host_total_listings_count', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_total_listings_count', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Median:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Multiple:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Feature Drop:- MSE: 10668.001517646375, MAE: 53.62444005576207, RMSE: 103.28601801621735, R2: 0.6235799666561306 Rows Drop:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
clean_data = mult_imp.copy()
host_verifications
visualize_feature('host_verifications', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_verifications', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Median:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Multiple:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Feature Drop:- MSE: 10352.554493285315, MAE: 53.39951208178439, RMSE: 101.74750362188409, R2: 0.6347105030768276 Rows Drop:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
clean_data = mult_imp.copy()
host_has_profile_pic
visualize_feature('host_has_profile_pic', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_has_profile_pic', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Median:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Multiple:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739 Feature Drop:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394 Rows Drop:- MSE: 10165.095488708179, MAE: 53.58457713754647, RMSE: 100.82209821615587, R2: 0.6413249870208739
clean_data = feature_drop.copy()
This is the first feature we observed which boosted the r2 value a bit when we drop the feature. So, we will dro the feature and move ahead as for now.
host_identity_verified
visualize_feature('host_identity_verified', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_identity_verified', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394 Median:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394 Multiple:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394 Feature Drop:- MSE: 10524.695340032527, MAE: 53.50630576208179, RMSE: 102.58993781084249, R2: 0.628636519757155 Rows Drop:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394
clean_data = mult_imp.copy()
bathrooms
visualize_feature('bathrooms', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('bathrooms', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 43091.50518031817, MAE: 54.76129354389224, RMSE: 207.5849348587661, R2: 0.30160163899510406 Median:- MSE: 43316.96363250116, MAE: 55.43455875522527, RMSE: 208.1272774830372, R2: 0.2979475588505279 Multiple:- MSE: 43091.50518031817, MAE: 54.76129354389224, RMSE: 207.5849348587661, R2: 0.30160163899510406 Feature Drop:- MSE: 43278.16783970041, MAE: 56.51264050162564, RMSE: 208.03405451920705, R2: 0.29857633517181326 Rows Drop:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394
clean_data = rows_drop.copy()
bedrooms
visualize_feature('bedrooms', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('bedrooms', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498 Median:- MSE: 9123.686382698561, MAE: 51.599275429633074, RMSE: 95.51798983803292, R2: 0.6395566315145446 Multiple:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498 Feature Drop:- MSE: 10330.488675998606, MAE: 53.64445192754297, RMSE: 101.63901158511237, R2: 0.5918803014164484 Rows Drop:- MSE: 10039.053234316916, MAE: 52.96376394052045, RMSE: 100.1950758985536, R2: 0.6457723832386394
clean_data = mult_imp.copy()
has_availability
visualize_feature('has_availability', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('has_availability', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 18090.303518500466, MAE: 55.08571494893222, RMSE: 134.5001989533862, R2: 0.5413275797874368 Median:- MSE: 17699.388780280875, MAE: 54.86184540389973, RMSE: 133.03904983229876, R2: 0.5512390668386394 Multiple:- MSE: 18090.303518500466, MAE: 55.08571494893222, RMSE: 134.5001989533862, R2: 0.5413275797874368 Feature Drop:- MSE: 17591.500776949862, MAE: 54.855868152274844, RMSE: 132.63295509393532, R2: 0.5539745240712464 Rows Drop:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
clean_data = rows_drop.copy()
Finally, we are done with the imputation of the features' values which carries missing values < 5%
# Getting the percentage of null values
clean_data.apply(lambda x: f"{round((x.isnull().sum() / len(clean_data)) * 100, 2)} %")
host_since 0.0 % host_response_time 25.29 % host_response_rate 25.29 % host_acceptance_rate 22.66 % host_is_superhost 0.0 % host_neighbourhood 0.0 % host_listings_count 0.0 % host_total_listings_count 0.0 % host_verifications 0.0 % host_identity_verified 0.0 % neighbourhood_cleansed 0.0 % latitude 0.0 % longitude 0.0 % property_type 0.0 % room_type 0.0 % accommodates 0.0 % bathrooms 0.0 % bedrooms 0.0 % beds 21.44 % amenities 0.0 % price 0.0 % minimum_nights 0.0 % maximum_nights 0.0 % minimum_minimum_nights 0.0 % maximum_minimum_nights 0.0 % minimum_maximum_nights 0.0 % maximum_maximum_nights 0.0 % minimum_nights_avg_ntm 0.0 % maximum_nights_avg_ntm 0.0 % has_availability 0.0 % availability_30 0.0 % availability_60 0.0 % availability_90 0.0 % availability_365 0.0 % number_of_reviews 0.0 % number_of_reviews_ltm 0.0 % number_of_reviews_l30d 0.0 % review_scores_rating 22.08 % review_scores_accuracy 22.08 % review_scores_cleanliness 22.09 % review_scores_checkin 22.08 % review_scores_communication 22.08 % review_scores_location 22.08 % review_scores_value 22.08 % instant_bookable 0.0 % calculated_host_listings_count 0.0 % calculated_host_listings_count_entire_homes 0.0 % calculated_host_listings_count_private_rooms 0.0 % calculated_host_listings_count_shared_rooms 0.0 % reviews_per_month 22.08 % dtype: object
clean_data.shape
(19853, 50)
Now, we will move forward and deal with the features which carries missing values > 5%
host_response_timehost_response_ratehost_acceptance_ratebedsreview_scores_ratingreview_scores_accuracyreview_scores_cleanlinessreview_scores_checkinreview_scores_communicationreview_scores_locationreview_scores_valuereviews_per_month
We segregated the process of imputing the features (<5% & >5%) for more clarity and refined version of the code but the process is still the same. We will run the above logic of ours again and check the accuracy accordingly and decide the imputation method.
host_response_time
visualize_feature('host_response_time', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_response_time', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498 Median:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498 Multiple:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498 Feature Drop:- MSE: 9136.766535415698, MAE: 52.369986065954485, RMSE: 95.58643489227798, R2: 0.6390398826799277 Rows Drop:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
clean_data = mult_imp.copy()
host_response_rate
visualize_feature('host_response_rate', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_response_rate', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 61299.810237428566, MAE: 57.039771428571434, RMSE: 247.58798484059878, R2: 0.2825718038541226 Median:- MSE: 60836.26963080219, MAE: 57.09521098901098, RMSE: 246.65009554184687, R2: 0.2879968957094523 Multiple:- MSE: 61299.810237428566, MAE: 57.039771428571434, RMSE: 247.58798484059878, R2: 0.2825718038541226 Feature Drop:- MSE: 60246.38296568132, MAE: 57.316863736263734, RMSE: 245.4513861555508, R2: 0.29490069075300973 Rows Drop:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
clean_data = rows_drop.copy()
host_acceptance_rate
visualize_feature('host_acceptance_rate', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('host_acceptance_rate', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 10192.225307390512, MAE: 51.38265054744526, RMSE: 100.95655158230451, R2: 0.6327382915757953 Median:- MSE: 10223.960468476278, MAE: 51.263270985401455, RMSE: 101.1136017975637, R2: 0.6315947621574429 Multiple:- MSE: 10192.225307390512, MAE: 51.38265054744526, RMSE: 100.95655158230451, R2: 0.6327382915757953 Feature Drop:- MSE: 10292.263703695255, MAE: 51.86634580291971, RMSE: 101.45079449514063, R2: 0.6291335564736134 Rows Drop:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
clean_data = rows_drop.copy()
beds
visualize_feature('beds', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('beds', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 32286.720948833754, MAE: 62.22030958439356, RMSE: 179.6850604497596, R2: 0.4253728795932271 Median:- MSE: 33354.02798757421, MAE: 63.586925360474964, RMSE: 182.63085168605608, R2: 0.40637734358840205 Multiple:- MSE: 32286.720948833754, MAE: 62.22030958439356, RMSE: 179.6850604497596, R2: 0.4253728795932271 Feature Drop:- MSE: 33077.59752327184, MAE: 62.44091391009329, RMSE: 181.87247599148102, R2: 0.41129715077311146 Rows Drop:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
clean_data = rows_drop.copy()
review_scores_rating
visualize_feature('review_scores_rating', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_rating', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498 Median:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498 Multiple:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498 Feature Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Rows Drop:- MSE: 8922.610786321413, MAE: 51.716618671621, RMSE: 94.45957223236518, R2: 0.6475003904556498
clean_data = feature_drop.copy()
We observed another feature which boosted our R2 a bit again but the method had to be feature drop and not imputing.
review_scores_accuracy
visualize_feature('review_scores_accuracy', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_accuracy', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Median:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Multiple:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Feature Drop:- MSE: 9597.270765873202, MAE: 52.24906874129122, RMSE: 97.9656611567196, R2: 0.6208470504117486 Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
clean_data = mult_imp.copy()
review_scores_cleanliness
visualize_feature('review_scores_cleanliness', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_cleanliness', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Median:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Multiple:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Feature Drop:- MSE: 9162.663587888992, MAE: 51.75394101254064, RMSE: 95.72180309568448, R2: 0.6380167851691431 Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
clean_data = mult_imp.copy()
review_scores_checkin
visualize_feature('review_scores_checkin', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_checkin', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Median:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Multiple:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Feature Drop:- MSE: 9066.90647520901, MAE: 51.788892243381326, RMSE: 95.22030495230001, R2: 0.6417997972985676 Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
clean_data = mult_imp.copy()
review_scores_communication
visualize_feature('review_scores_communication', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_communication', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Median:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Multiple:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Feature Drop:- MSE: 9272.672662157456, MAE: 52.085520204366, RMSE: 96.29471772718095, R2: 0.6336707303366924 Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
clean_data = mult_imp.copy()
review_scores_location
visualize_feature('review_scores_location', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_location', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Median:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Multiple:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Feature Drop:- MSE: 9165.49740116117, MAE: 51.61564328843474, RMSE: 95.73660429094595, R2: 0.6379048316057874 Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
clean_data = mult_imp.copy()
review_scores_value
visualize_feature('review_scores_value', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('review_scores_value', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Median:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Multiple:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232 Feature Drop:- MSE: 9447.420761054344, MAE: 52.02047840222944, RMSE: 97.19784339713686, R2: 0.6267670742090259 Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
clean_data = mult_imp.copy()
reviews_per_month
visualize_feature('reviews_per_month', clean_data)
mean_imp, med_imp, mult_imp, feature_drop, rows_drop = impute_feature('reviews_per_month', clean_data)
print(f"""Mean:- {model_check(mean_imp)}
Median:- {model_check(med_imp)}
Multiple:- {model_check(mult_imp)}
Feature Drop:- {model_check(feature_drop)}
Rows Drop:- {model_check(rows_drop)}""")
Mean:- MSE: 46768.35899868723, MAE: 56.3973073974703, RMSE: 216.25993387284487, R2: 0.28517903402287204 Median:- MSE: 46311.822850325785, MAE: 55.6945611345343, RMSE: 215.2018188824755, R2: 0.2921568629987479 Multiple:- MSE: 46768.35899868723, MAE: 56.3973073974703, RMSE: 216.25993387284487, R2: 0.28517903402287204 Feature Drop:- MSE: 46549.10039349367, MAE: 56.307974319662705, RMSE: 215.7524053017571, R2: 0.28853024521177983 Rows Drop:- MSE: 8654.793400905713, MAE: 51.13475150952159, RMSE: 93.03114210255464, R2: 0.6580808725644232
clean_data = rows_drop.copy()
So, at last we are done with the imputation, rows dropping, features dropping.
clean_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10763 entries, 0 to 10762 Data columns (total 49 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 host_since 10763 non-null float64 1 host_response_time 10763 non-null float64 2 host_response_rate 10763 non-null float64 3 host_acceptance_rate 10763 non-null float64 4 host_is_superhost 10763 non-null float64 5 host_neighbourhood 10763 non-null int32 6 host_listings_count 10763 non-null float64 7 host_total_listings_count 10763 non-null float64 8 host_verifications 10763 non-null float64 9 host_identity_verified 10763 non-null float64 10 neighbourhood_cleansed 10763 non-null int32 11 latitude 10763 non-null float64 12 longitude 10763 non-null float64 13 property_type 10763 non-null int32 14 room_type 10763 non-null int64 15 accommodates 10763 non-null int64 16 bathrooms 10763 non-null float64 17 bedrooms 10763 non-null float64 18 beds 10763 non-null float64 19 amenities 10763 non-null float64 20 price 10763 non-null float64 21 minimum_nights 10763 non-null int64 22 maximum_nights 10763 non-null int64 23 minimum_minimum_nights 10763 non-null int64 24 maximum_minimum_nights 10763 non-null int64 25 minimum_maximum_nights 10763 non-null int64 26 maximum_maximum_nights 10763 non-null int64 27 minimum_nights_avg_ntm 10763 non-null float64 28 maximum_nights_avg_ntm 10763 non-null float64 29 has_availability 10763 non-null float64 30 availability_30 10763 non-null int64 31 availability_60 10763 non-null int64 32 availability_90 10763 non-null int64 33 availability_365 10763 non-null int64 34 number_of_reviews 10763 non-null int64 35 number_of_reviews_ltm 10763 non-null int64 36 number_of_reviews_l30d 10763 non-null int64 37 review_scores_accuracy 10763 non-null float64 38 review_scores_cleanliness 10763 non-null float64 39 review_scores_checkin 10763 non-null float64 40 review_scores_communication 10763 non-null float64 41 review_scores_location 10763 non-null float64 42 review_scores_value 10763 non-null float64 43 instant_bookable 10763 non-null int64 44 calculated_host_listings_count 10763 non-null int64 45 calculated_host_listings_count_entire_homes 10763 non-null int64 46 calculated_host_listings_count_private_rooms 10763 non-null int64 47 calculated_host_listings_count_shared_rooms 10763 non-null int64 48 reviews_per_month 10763 non-null float64 dtypes: float64(26), int32(3), int64(20) memory usage: 3.9 MB
Feature Selection¶
We'll remove 1 feature at a time and run the base model again and will try to get high r2. If we found any increase in the r2 value, we can permanently remove that feature and decide to go further:
feat_sel = clean_data.copy()
for feature in feat_sel.columns:
if feature != 'price':
print(f'{feature}:-')
_, _, _, feature_drop, _ = impute_feature(feature, feat_sel)
print(f"""Feature Drop:- {model_check(feature_drop)}\n""")
host_since:- Feature Drop:- MSE: 9625.765810067349, MAE: 52.53617510450535, RMSE: 98.11098720361215, R2: 0.6197213157817247 host_response_time:- Feature Drop:- MSE: 9403.362930631678, MAE: 52.271658151416624, RMSE: 96.97093858796912, R2: 0.6285076374133707 host_response_rate:- Feature Drop:- MSE: 9144.446175406409, MAE: 51.62478402229448, RMSE: 95.62659763583774, R2: 0.638736488285283 host_acceptance_rate:- Feature Drop:- MSE: 9301.262796202971, MAE: 52.29496284254528, RMSE: 96.44305468100319, R2: 0.6325412390555849 host_is_superhost:- Feature Drop:- MSE: 8913.793923049234, MAE: 51.40878077101719, RMSE: 94.41289066144111, R2: 0.6478487123689638 host_neighbourhood:- Feature Drop:- MSE: 8251.50850817464, MAE: 50.50591267998142, RMSE: 90.83781430755938, R2: 0.6740131787724637 host_listings_count:- Feature Drop:- MSE: 8658.8173058523, MAE: 51.08977705527171, RMSE: 93.05276624503057, R2: 0.6579219028462001 host_total_listings_count:- Feature Drop:- MSE: 9201.8425395843, MAE: 51.70421040408733, RMSE: 95.92623488693957, R2: 0.6364689685596676 host_verifications:- Feature Drop:- MSE: 9075.372496980957, MAE: 51.85198792382722, RMSE: 95.26474949833731, R2: 0.6414653358454714 host_identity_verified:- Feature Drop:- MSE: 8898.117051799814, MAE: 51.51005341384115, RMSE: 94.32983118716906, R2: 0.6484680480238116 neighbourhood_cleansed:- Feature Drop:- MSE: 9098.23823848119, MAE: 52.09086391082211, RMSE: 95.38468555528812, R2: 0.6405619943074621 latitude:- Feature Drop:- MSE: 9551.209884486763, MAE: 54.0278959591268, RMSE: 97.73029153996607, R2: 0.6226667468092288 longitude:- Feature Drop:- MSE: 8800.232519925687, MAE: 52.409958197863446, RMSE: 93.80955452365014, R2: 0.652335106678769 property_type:- Feature Drop:- MSE: 9115.67557793776, MAE: 52.00793776126335, RMSE: 95.47604714239986, R2: 0.6398731090140055 room_type:- Feature Drop:- MSE: 8835.153132535996, MAE: 51.15077101718533, RMSE: 93.99549527789083, R2: 0.6509555214200402 accommodates:- Feature Drop:- MSE: 9094.510992835578, MAE: 52.97577566186716, RMSE: 95.36514558703078, R2: 0.6407092441053315 bathrooms:- Feature Drop:- MSE: 9143.072434788668, MAE: 53.109312587087786, RMSE: 95.61941452858132, R2: 0.638790759735979 bedrooms:- Feature Drop:- MSE: 9725.2821998026, MAE: 52.14912447747329, RMSE: 98.61684541599675, R2: 0.6157897884109784 beds:- Feature Drop:- MSE: 8378.359680933581, MAE: 50.89205294937297, RMSE: 91.53338014589859, R2: 0.6690017544328166 amenities:- Feature Drop:- MSE: 8540.55818023688, MAE: 51.478077101718526, RMSE: 92.41514042751263, R2: 0.6625938869327859 minimum_nights:- Feature Drop:- MSE: 9442.976488829541, MAE: 51.64024616813748, RMSE: 97.17497871792688, R2: 0.6269426510958216 maximum_nights:- Feature Drop:- MSE: 9146.358621110077, MAE: 51.686841616349284, RMSE: 95.63659666210461, R2: 0.6386609345734868 minimum_minimum_nights:- Feature Drop:- MSE: 9562.209349117513, MAE: 52.44162563864375, RMSE: 97.78654993974126, R2: 0.6222321983255631 maximum_minimum_nights:- Feature Drop:- MSE: 9389.062629958196, MAE: 51.79325592196935, RMSE: 96.89717555201594, R2: 0.629072589816259 minimum_maximum_nights:- Feature Drop:- MSE: 9847.084164015328, MAE: 52.532635856943806, RMSE: 99.23247534963203, R2: 0.6109778397722971 maximum_maximum_nights:- Feature Drop:- MSE: 9303.205598618208, MAE: 52.211377148165354, RMSE: 96.45312643257454, R2: 0.6324644860615126 minimum_nights_avg_ntm:- Feature Drop:- MSE: 9766.170818218763, MAE: 52.38351370181143, RMSE: 98.82393848769013, R2: 0.6141744291431953 maximum_nights_avg_ntm:- Feature Drop:- MSE: 9860.072622944726, MAE: 53.137524384579656, RMSE: 99.29789838130878, R2: 0.6104647134227522 has_availability:- Feature Drop:- MSE: 9184.65209803762, MAE: 52.02367162099396, RMSE: 95.83659060107273, R2: 0.6371480998227268 availability_30:- Feature Drop:- MSE: 9151.738380712959, MAE: 51.75203669298653, RMSE: 95.6647185785489, R2: 0.63844840001327 availability_60:- Feature Drop:- MSE: 9180.875241848584, MAE: 51.3397631212262, RMSE: 95.81688390804923, R2: 0.637297309551115 availability_90:- Feature Drop:- MSE: 10003.288638248954, MAE: 52.439739897817006, RMSE: 100.01644183957433, R2: 0.6048067742069533 availability_365:- Feature Drop:- MSE: 10043.63491254064, MAE: 52.48329307942406, RMSE: 100.21793707984934, R2: 0.6032128409653296 number_of_reviews:- Feature Drop:- MSE: 9010.185795529493, MAE: 51.500550394797955, RMSE: 94.92199848048656, R2: 0.64404062320916 number_of_reviews_ltm:- Feature Drop:- MSE: 8918.248273153738, MAE: 51.33523455643288, RMSE: 94.43647744994377, R2: 0.6476727373421232 number_of_reviews_l30d:- Feature Drop:- MSE: 9217.58097159777, MAE: 51.88913608917789, RMSE: 96.00823387396402, R2: 0.6358472008649372 review_scores_accuracy:- Feature Drop:- MSE: 9597.270765873202, MAE: 52.24906874129122, RMSE: 97.9656611567196, R2: 0.6208470504117486 review_scores_cleanliness:- Feature Drop:- MSE: 9162.663587888992, MAE: 51.75394101254064, RMSE: 95.72180309568448, R2: 0.6380167851691431 review_scores_checkin:- Feature Drop:- MSE: 9066.90647520901, MAE: 51.788892243381326, RMSE: 95.22030495230001, R2: 0.6417997972985676 review_scores_communication:- Feature Drop:- MSE: 9272.672662157456, MAE: 52.085520204366, RMSE: 96.29471772718095, R2: 0.6336707303366924 review_scores_location:- Feature Drop:- MSE: 9165.49740116117, MAE: 51.61564328843474, RMSE: 95.73660429094595, R2: 0.6379048316057874 review_scores_value:- Feature Drop:- MSE: 9447.420761054344, MAE: 52.02047840222944, RMSE: 97.19784339713686, R2: 0.6267670742090259 instant_bookable:- Feature Drop:- MSE: 9121.619609823501, MAE: 51.83424059451927, RMSE: 95.50717046286891, R2: 0.6396382821265596 calculated_host_listings_count:- Feature Drop:- MSE: 9430.414378251277, MAE: 52.14480492336275, RMSE: 97.11032065775129, R2: 0.6274389339865513 calculated_host_listings_count_entire_homes:- Feature Drop:- MSE: 9238.65334006038, MAE: 52.43232698560148, RMSE: 96.11791373131432, R2: 0.6350147089146403 calculated_host_listings_count_private_rooms:- Feature Drop:- MSE: 9124.77942595216, MAE: 51.79639804923362, RMSE: 95.52371132840348, R2: 0.6395134493866522 calculated_host_listings_count_shared_rooms:- Feature Drop:- MSE: 9426.582823513703, MAE: 51.938678588016714, RMSE: 97.09059080834612, R2: 0.6275903046538691 reviews_per_month:- Feature Drop:- MSE: 9381.561797770553, MAE: 51.587148165350676, RMSE: 96.85846270600496, R2: 0.629368920170762
From the above observation, we almost got similar score except some features. Those features are as follow:-
host_neighbourhood- R-Square: 0.6740131787724637
beds- R-Square: 0.6690017544328166
amenities- R-Square: 0.6625938869327859
So, according to the above observation, we can go ahead and remove the feature which is giving us the highest r2 which is host_neighbourhood
We will drop the feature and check the base model again if we experience any boost in the r2 value anymore or not.
feat_sel = feat_sel.drop(columns = ['host_neighbourhood'])
model_check(feat_sel)
'MSE: 8251.50850817464, MAE: 50.50591267998142, RMSE: 90.83781430755938, R2: 0.6740131787724637'
for feature in feat_sel.columns:
if feature != 'price':
print(f'{feature}:-')
_, _, _, feature_drop, _ = impute_feature(feature, feat_sel)
print(f"""Feature Drop:- {model_check(feature_drop)}\n""")
host_since:- Feature Drop:- MSE: 9318.468203913146, MAE: 51.94708546214585, RMSE: 96.53221329645946, R2: 0.6318615165343264 host_response_time:- Feature Drop:- MSE: 9123.627227264284, MAE: 51.74047840222945, RMSE: 95.51768018154694, R2: 0.6395589685286747 host_response_rate:- Feature Drop:- MSE: 9094.911053843474, MAE: 51.50447515095216, RMSE: 95.36724308610097, R2: 0.6406934391629828 host_acceptance_rate:- Feature Drop:- MSE: 8797.917992487228, MAE: 51.00088016720854, RMSE: 93.79721740268859, R2: 0.652426545164418 host_is_superhost:- Feature Drop:- MSE: 9414.410384463537, MAE: 51.84169066418951, RMSE: 97.02788457172267, R2: 0.6280711930524714 host_listings_count:- Feature Drop:- MSE: 9342.664672909894, MAE: 52.30148397584765, RMSE: 96.65746051345387, R2: 0.6309056028361988 host_total_listings_count:- Feature Drop:- MSE: 9399.028499187181, MAE: 52.159577333952626, RMSE: 96.94858688597364, R2: 0.6286788748940109 host_verifications:- Feature Drop:- MSE: 8890.64937882025, MAE: 51.55482117974918, RMSE: 94.29024010373635, R2: 0.648763068379685 host_identity_verified:- Feature Drop:- MSE: 9448.848178100323, MAE: 51.73062703204831, RMSE: 97.20518596299439, R2: 0.6267106821996226 neighbourhood_cleansed:- Feature Drop:- MSE: 9210.063616569902, MAE: 51.76951463074779, RMSE: 95.96907635571941, R2: 0.6361441839762247 latitude:- Feature Drop:- MSE: 9091.143155794241, MAE: 53.721985601486296, RMSE: 95.34748636327149, R2: 0.6408422949881405 longitude:- Feature Drop:- MSE: 8962.71203493962, MAE: 52.4985578262889, RMSE: 94.67160099491092, R2: 0.6459161372792331 property_type:- Feature Drop:- MSE: 9511.673690501626, MAE: 51.99487227124942, RMSE: 97.5278098313585, R2: 0.6242286767506307 room_type:- Feature Drop:- MSE: 9204.469667928472, MAE: 51.801490942870416, RMSE: 95.93992739171982, R2: 0.6363651803593616 accommodates:- Feature Drop:- MSE: 9400.587022050628, MAE: 53.26331398049234, RMSE: 96.95662443613962, R2: 0.6286173033748683 bathrooms:- Feature Drop:- MSE: 9605.223566500232, MAE: 54.08470274036228, RMSE: 98.00624248740604, R2: 0.6205328644427595 bedrooms:- Feature Drop:- MSE: 9915.770128588018, MAE: 52.89165815141662, RMSE: 99.57796005436151, R2: 0.6082643093636675 beds:- Feature Drop:- MSE: 8816.090090118438, MAE: 51.31010915002323, RMSE: 93.89403649922842, R2: 0.6517086322717675 amenities:- Feature Drop:- MSE: 8816.516199326521, MAE: 51.5046214584301, RMSE: 93.89630556803884, R2: 0.6516917982606165 minimum_nights:- Feature Drop:- MSE: 8900.99473643753, MAE: 51.183562470970735, RMSE: 94.34508326583601, R2: 0.648354361263795 maximum_nights:- Feature Drop:- MSE: 8994.315453274501, MAE: 51.67609382257316, RMSE: 94.83836488085663, R2: 0.6446676021934721 minimum_minimum_nights:- Feature Drop:- MSE: 9435.714597375754, MAE: 52.44083604273108, RMSE: 97.1376065042564, R2: 0.6272295417787523 maximum_minimum_nights:- Feature Drop:- MSE: 9548.531008546213, MAE: 52.02593125870878, RMSE: 97.71658512528062, R2: 0.6227725793671736 minimum_maximum_nights:- Feature Drop:- MSE: 9005.201583534603, MAE: 51.503669298653044, RMSE: 94.89574059742937, R2: 0.644237531134896 maximum_maximum_nights:- Feature Drop:- MSE: 8995.77228597306, MAE: 51.40321411983279, RMSE: 94.84604517834711, R2: 0.6446100480795796 minimum_nights_avg_ntm:- Feature Drop:- MSE: 9376.265891105435, MAE: 51.997756618671616, RMSE: 96.83112046808833, R2: 0.6295781420091165 maximum_nights_avg_ntm:- Feature Drop:- MSE: 9035.958946969344, MAE: 51.36028564793312, RMSE: 95.05766116925739, R2: 0.6430224205735364 has_availability:- Feature Drop:- MSE: 9279.659596017187, MAE: 51.75809800278681, RMSE: 96.33098980087969, R2: 0.6333947022193118 availability_30:- Feature Drop:- MSE: 9729.831380213656, MAE: 52.53590803529958, RMSE: 98.6399076449976, R2: 0.6156100669867174 availability_60:- Feature Drop:- MSE: 9394.82174004877, MAE: 51.73112633534603, RMSE: 96.92688863286993, R2: 0.6288450685103468 availability_90:- Feature Drop:- MSE: 9542.934187923827, MAE: 51.92574082675336, RMSE: 97.68794289943784, R2: 0.6229936892117389 availability_365:- Feature Drop:- MSE: 9579.147125290294, MAE: 52.31841151881097, RMSE: 97.87311748018602, R2: 0.6215630489442399 number_of_reviews:- Feature Drop:- MSE: 9436.110601730144, MAE: 52.007954017649794, RMSE: 97.13964485075157, R2: 0.6272138970998973 number_of_reviews_ltm:- Feature Drop:- MSE: 9329.351991546679, MAE: 52.18792382721784, RMSE: 96.5885707086852, R2: 0.6314315380243295 number_of_reviews_l30d:- Feature Drop:- MSE: 8974.993729737575, MAE: 51.948293079424054, RMSE: 94.73644351429694, R2: 0.6454309325869632 review_scores_accuracy:- Feature Drop:- MSE: 9183.790988132838, MAE: 52.20270320483046, RMSE: 95.8320979011356, R2: 0.6371821191151155 review_scores_cleanliness:- Feature Drop:- MSE: 9540.365933139805, MAE: 52.076479331165814, RMSE: 97.67479681647566, R2: 0.6230951515337273 review_scores_checkin:- Feature Drop:- MSE: 9159.559443892244, MAE: 52.07845796562936, RMSE: 95.70558731804661, R2: 0.6381394185074145 review_scores_communication:- Feature Drop:- MSE: 9112.987361762656, MAE: 51.2015490013934, RMSE: 95.46196814314409, R2: 0.6399793105703423 review_scores_location:- Feature Drop:- MSE: 9514.154059800278, MAE: 52.03165350673479, RMSE: 97.54052521798454, R2: 0.6241306864616627 review_scores_value:- Feature Drop:- MSE: 9158.146902728751, MAE: 51.818855085926614, RMSE: 95.69820741648587, R2: 0.6381952228252896 instant_bookable:- Feature Drop:- MSE: 8998.674341558291, MAE: 51.62675104505342, RMSE: 94.86134271429164, R2: 0.6444953985128589 calculated_host_listings_count:- Feature Drop:- MSE: 9173.602801823039, MAE: 51.90013237343242, RMSE: 95.77892671054025, R2: 0.6375846169694066 calculated_host_listings_count_entire_homes:- Feature Drop:- MSE: 9574.735241581515, MAE: 52.15449837436136, RMSE: 97.85057609223114, R2: 0.6217373462796212 calculated_host_listings_count_private_rooms:- Feature Drop:- MSE: 9041.59392575476, MAE: 51.91689270784951, RMSE: 95.08729634264905, R2: 0.6427998032399759 calculated_host_listings_count_shared_rooms:- Feature Drop:- MSE: 9348.178170901067, MAE: 51.49002786809103, RMSE: 96.68597711613131, R2: 0.6306877847630299 reviews_per_month:- Feature Drop:- MSE: 8586.394411483976, MAE: 50.614781699953554, RMSE: 92.66279950165533, R2: 0.6607830656379294
Removing Outliers¶
# Getting Q1 (25th percentile) and Q3 (75th percentile)
Q1 = feat_sel['price'].quantile(0.25)
Q3 = feat_sel['price'].quantile(0.75)
# Calculating the IQR
IQR = Q3 - Q1
# Defining the outlier bounds
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
# Filtering out the outliers
filtered_data = feat_sel[(feat_sel['price'] >= lower_bound) & (feat_sel['price'] <= upper_bound)]
filtered_data.head(5)
| host_since | host_response_time | host_response_rate | host_acceptance_rate | host_is_superhost | host_listings_count | host_total_listings_count | host_verifications | host_identity_verified | neighbourhood_cleansed | ... | review_scores_checkin | review_scores_communication | review_scores_location | review_scores_value | instant_bookable | calculated_host_listings_count | calculated_host_listings_count_entire_homes | calculated_host_listings_count_private_rooms | calculated_host_listings_count_shared_rooms | reviews_per_month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 176.0 | 2.0 | 100.0 | 38.0 | 1.0 | 5.0 | 10.0 | 7.0 | 1.0 | 122 | ... | 4.64 | 4.76 | 4.86 | 4.67 | 0 | 5 | 5 | 0 | 0 | 0.25 |
| 1 | 172.0 | 1.0 | 77.0 | 62.0 | 1.0 | 9.0 | 19.0 | 6.0 | 1.0 | 104 | ... | 4.79 | 4.84 | 4.95 | 4.23 | 0 | 9 | 7 | 2 | 0 | 0.40 |
| 2 | 172.0 | 1.0 | 77.0 | 62.0 | 1.0 | 9.0 | 19.0 | 6.0 | 1.0 | 6 | ... | 4.63 | 4.69 | 4.92 | 4.21 | 0 | 9 | 7 | 2 | 0 | 0.53 |
| 3 | 172.0 | 1.0 | 77.0 | 62.0 | 1.0 | 9.0 | 19.0 | 6.0 | 1.0 | 104 | ... | 4.50 | 4.80 | 4.85 | 4.25 | 0 | 9 | 7 | 2 | 0 | 0.14 |
| 4 | 172.0 | 1.0 | 77.0 | 62.0 | 1.0 | 9.0 | 19.0 | 6.0 | 1.0 | 23 | ... | 4.80 | 4.86 | 4.92 | 4.50 | 0 | 9 | 7 | 2 | 0 | 0.36 |
5 rows × 48 columns
Splitting the Data¶
model_data = filtered_data.copy()
# Separating features (X) and target (y)
X = model_data.drop('price', axis = 1)
y = model_data['price']
# Splitting the data (test & train)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)
# Printing the shapes
print("Training Features Shape:", X_train.shape)
print("Testing Features Shape:", X_test.shape)
print("Training Target Shape:", y_train.shape)
print("Testing Target Shape:", y_test.shape)
Training Features Shape: (8176, 47) Testing Features Shape: (2045, 47) Training Target Shape: (8176,) Testing Target Shape: (2045,)
Models¶
1. Linear Regression¶
if os.path.exists('Models/PricePrediction/lr.pkl'):
print('Model Exists. Loading the model')
# Loading the model
lr_model = joblib.load('Models/PricePrediction/lr.pkl')
else:
print('Model not Found. Creating & Saving the model')
# Creating Linear Regression object
lr_model = LinearRegression()
# Fitting/Training the model
lr_model.fit(X_train, y_train)
# Saving the model
joblib.dump(lr_model, 'Models/PricePrediction/lr.pkl')
# Predicting
y_pred_lr = lr_model.predict(X_test)
Model Exists. Loading the model
# Metrics
mse_lr = mean_squared_error(y_test, y_pred_lr)
mae_lr = mean_absolute_error(y_test, y_pred_lr)
rmse_lr = np.sqrt(mse_lr)
r2_lr = r2_score(y_test, y_pred_lr)
print(f"Metrics:\nMSE: {mse_lr}, MAE: {mae_lr}, RMSE: {rmse_lr}, R2: {r2_lr}")
Metrics: MSE: 3996.6172754860736, MAE: 47.69454383549999, RMSE: 63.21880476160613, R2: 0.6132730441656875
2. Decision Tree¶
if os.path.exists('Models/PricePrediction/dt.pkl'):
print('Model Exists. Loading the model')
# Loading the model
dt_model = joblib.load('Models/PricePrediction/dt.pkl')
else:
print('Model not Found. Creating & Saving the model')
# Creating decision tree object
dt_model = DecisionTreeRegressor(random_state = 42, max_depth = 6)
# Fitting/Training the model
dt_model.fit(X_train, y_train)
# Saving the model
joblib.dump(dt_model, 'Models/PricePrediction/dt.pkl')
# Predicting
y_pred_dt = dt_model.predict(X_test)
Model Exists. Loading the model
# Metrics
mse_dt = mean_squared_error(y_test, y_pred_dt)
mae_dt = mean_absolute_error(y_test, y_pred_dt)
rmse_dt = np.sqrt(mse_dt)
r2_dt = r2_score(y_test, y_pred_dt)
print(f"Metrics:\nMSE: {mse_dt}, MAE: {mae_dt}, RMSE: {rmse_dt}, R2: {r2_dt}")
Metrics: MSE: 3435.458525837511, MAE: 42.11962659151027, RMSE: 58.6127846620301, R2: 0.6675727681654002
3. Random Forest¶
if os.path.exists('Models/PricePrediction/rf.pkl'):
print('Model Exists. Loading the model')
# Loading the model
rf_model = joblib.load('Models/PricePrediction/rf.pkl')
else:
print('Model not Found. Creating & Saving the model')
# Creating random forest object
rf_model = RandomForestRegressor(n_estimators = 145, random_state = 42, n_jobs = -1, max_features = 12, bootstrap = False)
# Fitting/Training the model
rf_model.fit(X_train, y_train)
# Saving the model
joblib.dump(rf_model, 'Models/PricePrediction/rf.pkl')
# Predicting
y_pred_rf = rf_model.predict(X_test)
Model Exists. Loading the model
# Metrics
mse_rf = mean_squared_error(y_test, y_pred_rf)
mae_rf = mean_absolute_error(y_test, y_pred_rf)
rmse_rf = np.sqrt(mse_rf)
r2_rf = r2_score(y_test, y_pred_rf)
print(f"Metrics:\nMSE: {mse_rf}, MAE: {mae_rf}, RMSE: {rmse_rf}, R2: {r2_rf}")
Metrics: MSE: 2435.355771177054, MAE: 34.10725908439423, RMSE: 49.34932391813543, R2: 0.7643462811569113
4. XGBoost¶
if os.path.exists('Models/PricePrediction/xgb.pkl'):
print('Model Exists. Loading the model')
# Loading the model
xgb_model = joblib.load('Models/PricePrediction/xgb.pkl')
else:
print('Model not Found. Creating & Saving the model')
# Initializing the Model Object
xgb_model = XGBRegressor(n_estimators = 130, learning_rate = 0.1, max_depth = 9, subsample = 0.8, colsample_bytree = 0.8, random_state = 42)
# Fitting the model
xgb_model.fit(X_train, y_train)
# Saving the model
joblib.dump(xgb_model, 'Models/PricePrediction/xgb.pkl')
# Predicting
y_pred_xgb = xgb_model.predict(X_test)
Model Exists. Loading the model
# Metrics
mse_xgb = mean_squared_error(y_test, y_pred_xgb)
mae_xgb = mean_absolute_error(y_test, y_pred_xgb)
rmse_xgb = np.sqrt(mse_xgb)
r2_xgb = r2_score(y_test, y_pred_xgb)
print(f"Metrics:\nMSE: {mse_xgb}, MAE: {mae_xgb}, RMSE: {rmse_xgb}, R2: {r2_xgb}")
Metrics: MSE: 2395.0204097663013, MAE: 33.73718720321842, RMSE: 48.93894573615477, R2: 0.7682492747276329
Metrics Comparison¶
df_metrics = pd.DataFrame({'Model': ['Linear Regression', 'Decision Tree', 'Random Forest', 'XGBoost'],
'MSE': [mse_lr, mse_dt, mse_rf, mse_xgb],
'MAE': [mae_lr, mae_dt, mae_rf, mae_xgb],
'RMSE': [rmse_lr, rmse_dt, rmse_rf, rmse_xgb],
'R-squared': [r2_lr, r2_dt, r2_rf, r2_xgb]})
df_metrics
| Model | MSE | MAE | RMSE | R-squared | |
|---|---|---|---|---|---|
| 0 | Linear Regression | 3996.617275 | 47.694544 | 63.218805 | 0.613273 |
| 1 | Decision Tree | 3435.458526 | 42.119627 | 58.612785 | 0.667573 |
| 2 | Random Forest | 2435.355771 | 34.107259 | 49.349324 | 0.764346 |
| 3 | XGBoost | 2395.020410 | 33.737187 | 48.938946 | 0.768249 |
Best Model¶
# Highest R-Squared value row
r2_idx_max = df_metrics['R-squared'].idxmax()
r2_idx_min = df_metrics['R-squared'].idxmin()
# Function - Highlighting the row
def highlight_max_r2(max_idx, min_idx):
def highlight(s):
return [
'background-color: lightgreen' if i == max_idx
else 'background-color: lightcoral' if i == min_idx
else 'background-color: lightcyan'
for i in range(len(s))
]
return df_metrics.style.apply(highlight, axis = 0).applymap(lambda _: 'font-weight: bold;', subset = [df_metrics.columns[0]])
# Highlighting the row with the highest R-squared
highlighted_df = highlight_max_r2(r2_idx_max, r2_idx_min)
highlighted_df
| Model | MSE | MAE | RMSE | R-squared | |
|---|---|---|---|---|---|
| 0 | Linear Regression | 3996.617275 | 47.694544 | 63.218805 | 0.613273 |
| 1 | Decision Tree | 3435.458526 | 42.119627 | 58.612785 | 0.667573 |
| 2 | Random Forest | 2435.355771 | 34.107259 | 49.349324 | 0.764346 |
| 3 | XGBoost | 2395.020410 | 33.737187 | 48.938946 | 0.768249 |
Best Metrics¶
# Creating a custom colormap from green to red
cmap = mpl.colors.LinearSegmentedColormap.from_list('green_red', ['lightgreen', 'lightcoral'])
cmap_r2 = mpl.colors.LinearSegmentedColormap.from_list('red_green', ['lightcoral', 'lightgreen'])
# Custom function to apply separate gradients
def apply_custom_gradient(styler):
for col in ['MSE', 'MAE', 'RMSE']:
styler = styler.background_gradient(subset = [col], cmap = cmap)
styler = styler.background_gradient(subset=['R-squared'], cmap = cmap_r2)
return styler
# Styling DataFrame
styled_df = (df_metrics.style.pipe(apply_custom_gradient).applymap(lambda _: 'font-weight: bold', subset = pd.IndexSlice[:, ['Model']]))
styled_df
| Model | MSE | MAE | RMSE | R-squared | |
|---|---|---|---|---|---|
| 0 | Linear Regression | 3996.617275 | 47.694544 | 63.218805 | 0.613273 |
| 1 | Decision Tree | 3435.458526 | 42.119627 | 58.612785 | 0.667573 |
| 2 | Random Forest | 2435.355771 | 34.107259 | 49.349324 | 0.764346 |
| 3 | XGBoost | 2395.020410 | 33.737187 | 48.938946 | 0.768249 |
Model Comparison¶
In this analysis, we compared the performance of four regression models: Linear Regression, Decision Tree, Random Forest, and XGBoost. The following metrics were used to evaluate the models: Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R²).
1. Linear Regression:¶
- MSE: 3996.6173
- MAE: 47.6945
- RMSE: 63.2188
- R-squared: 0.61
Linear Regression shows a moderate R-squared value of 0.61, indicating that the model can explain 61% of the variance in the target variable. The lower MSE, MAE, and RMSE compared to the previous analysis suggest improved accuracy, but it still lags behind more complex models. This indicates that while a linear relationship captures more variance than before, it may not be the best fit for the data.
2. Decision Tree:¶
- MSE: 3435.4585
- MAE: 42.1196
- RMSE: 58.6128
- R-squared: 0.67
The Decision Tree model improved significantly compared to the previous version, achieving an R-squared value of 0.67 and explaining 67% of the variance. Its error metrics (MSE, MAE, RMSE) have also decreased, indicating better predictions. However, it is still outperformed by ensemble methods like Random Forest and XGBoost.
3. Random Forest:¶
- MSE: 2435.3558
- MAE: 34.1073
- RMSE: 49.3493
- R-squared: 0.76
The Random Forest model continues to show strong performance, with an R-squared value of 0.76, explaining 76% of the variance in the target variable. Its lower MSE, MAE, and RMSE metrics reflect its ability to make more accurate predictions. This highlights the benefits of ensemble learning and the robustness of the model for this task.
4. XGBoost:¶
- MSE: 2395.0204
- MAE: 33.7372
- RMSE: 48.9390
- R-squared: 0.77
XGBoost emerges as the top-performing model, slightly surpassing Random Forest with an R-squared value of 0.77, explaining 77% of the variance in the target variable. Its MSE, MAE, and RMSE are the lowest among all models, indicating the highest accuracy. This showcases XGBoost’s ability to optimize predictions effectively, making it a strong choice for this dataset.
Summary:¶
- XGBoost performed the best with the highest R-squared value
0.77, making the most accurate predictions with the lowest error metrics. - Random Forest was a close second, with an R-squared value of
0.76, demonstrating strong predictive power and low error metrics. - Decision Tree improved significantly, achieving an R-squared value of
0.67, but it was outperformed by the ensemble methods. - Linear Regression, while improved, had the lowest R-squared value of
0.61, indicating it was the least effective in capturing the underlying relationships in the data.
Conclusion: Based on the evaluation metrics, XGBoost now emerges as the best-performing model for this price prediction task, demonstrating the highest accuracy and explaining the most variance in the target variable.
Marketing Strategies¶
1. Existing Hosts:¶
Benefit of Using the Price Prediction Model: Existing hosts can leverage the price prediction model to optimize their pricing based on several factors, such as market trends, listing features, and demand. By plugging in their property details (e.g., location, room type, amenities, etc.), the model can provide a competitive price estimate, allowing them to adjust their pricing dynamically.
How It Can Help:
- Dynamic Pricing: If an existing host wants to adjust their rates based on current trends or events (like holidays, local festivals, or seasonality), they can input their current listing details into the model and receive an adjusted price that aligns with the market.
- Competitive Analysis: The price prediction model helps hosts understand the ideal pricing for their listing in comparison to similar properties. If their price is too high or low compared to similar listings, they can adjust to stay competitive.
- Revenue Maximization: By adjusting prices based on model predictions, hosts can maximize their revenue. The model helps avoid underpricing (leading to missed revenue) or overpricing (leading to fewer bookings).
2. Potential Guests:¶
Benefit of Using the Price Prediction Model: Guests can use the price prediction model to understand the fair price for a property they’re interested in, helping them determine whether the listing is priced competitively based on its features and location.
How It Can Help:
- Price Comparison: Guests can input the details of properties they are considering and compare the predicted price with the listing’s actual price to ensure they are getting a fair deal.
- Budget Planning: By knowing the predicted price range for the type of property they want, guests can better plan their budget, especially for long stays or specific dates when prices might fluctuate due to demand.
- Identifying Overpriced Listings: Guests can use the price prediction model to check if a listing is marked up unfairly. If the model predicts a price much lower than the listed price, they can negotiate or opt for other more competitively priced options.
3. New Hosts:¶
Benefit of Using the Price Prediction Model: New hosts, with limited experience, can use the price prediction model to set the right price for their listings based on real market data. The model can serve as a guide to help them start strong, especially if they are unsure how to price their property effectively.
How It Can Help:
- Initial Pricing Guidance: New hosts can enter their property details into the model and receive a suggested price range for their listing, ensuring they start with a competitive price. This takes the guesswork out of pricing and gives them a benchmark to build on.
- Avoiding Underpricing or Overpricing: The model can help new hosts avoid the common pitfalls of underpricing (leading to missed revenue) or overpricing (leading to fewer bookings), especially when they are unfamiliar with market trends.
- Optimizing for Occupancy: New hosts can also use the model to predict prices based on seasonality and local demand, ensuring they don’t miss out on potential bookings. For example, the model can show them when to increase prices during peak seasons or offer discounts during off-peak periods.
Summary of How Each Group Can Use the Model:¶
- Existing Hosts: Use the model to optimize prices dynamically, maximize revenue, and ensure competitiveness in the market by adjusting prices based on market demand and similar listings.
- Potential Guests: Use the model to compare prices across different listings, ensuring they are getting a good deal and staying within their budget. They can also check if the listings are priced fairly compared to the market.
- New Hosts: Use the model to set initial pricing based on data, avoiding common mistakes like underpricing or overpricing. This ensures they start their journey with a competitive price that aligns with market trends and demand.
The price prediction model ultimately helps both hosts and guests make data-driven decisions, ensuring that hosts maximize revenue and guests get good value, which is critical in a competitive market like Airbnb.
Visualization¶
vis_data = model_data.copy()
# Creating a `date` feature from original data source
vis_data['date'] = pd.to_datetime(df['host_since'])
1. Actual VS Predicted (Prices)¶
# Defining a temporary Data Frame for Actual & Predicted values
temp_df = pd.DataFrame({
'Actual': y_test,
'Predicted': y_pred_rf
})
# Visualization - ScatterPlot
fig = px.scatter(temp_df, x = 'Actual', y = 'Predicted', title = "Actual vs Predicted Prices")
# Adding the `ideal line` slightly above the scatter points
ideal_y = temp_df['Actual'] + (max(temp_df['Predicted']) - min(temp_df['Predicted'])) * 0.1
# Visualization - Adding the `ideal line`
fig.add_scatter(x = [min(temp_df['Actual']), max(temp_df['Actual'])],
y = [min(temp_df['Predicted']), max(temp_df['Predicted'])],
mode = 'lines', name = 'Ideal Line', line = dict(dash = 'dash', color = 'red'))
# Customizing the background color
fig.update_layout(
plot_bgcolor = '#dfedff',
paper_bgcolor = '#dfedff',
width = 1000,
height = 300
)
# Showing
fig.show()
2. Reviews Importance in Price¶
# Getting all the features required
correlation_matrix = vis_data[['price', 'review_scores_accuracy', 'review_scores_cleanliness',
'review_scores_checkin', 'review_scores_communication',
'review_scores_location', 'review_scores_value']].corr()
# Figure Size
plt.figure(figsize = (10, 5))
# Visualization - Heatmap
ax = sns.heatmap(correlation_matrix, annot = True, cmap = 'coolwarm', fmt = '.2f')
# Customizing the background color
fig = plt.gcf()
fig.patch.set_facecolor('#dfedff')
ax.set_facecolor('#dfedff')
# Labeling - Title
plt.title('Price vs. Review Scores Heatmap')
# Showing
plt.show()
3. Price by Accomodates¶
# Figure Size
plt.figure(figsize = (10, 5))
# Visualization - Barplot
ax = sns.barplot(x = 'accommodates', y = 'price', data = vis_data, estimator = 'mean')
# Customizing the background color
fig = plt.gcf()
fig.patch.set_facecolor('#dfedff')
ax.set_facecolor('#dfedff')
# Labeling
plt.title('Average Price by Accommodates')
plt.xlabel('Accommodates')
plt.ylabel('Average Price')
# Showing
plt.show()
4. Seasonal Price Trend¶
# Function - In order to get Seasons based on the months
def get_season(month):
if month in [12, 1, 2]:
return 'Winter'
elif month in [3, 4, 5]:
return 'Spring'
elif month in [6, 7, 8]:
return 'Summer'
else:
return 'Fall'
# Creating a new feature `season` using the `get_season` function
vis_data['season'] = vis_data['date'].dt.month.apply(get_season)
# Figure Size
plt.figure(figsize = (10, 2))
# Visualization - Lineplot
ax = sns.lineplot(x = 'season', y = 'price', data = vis_data, estimator = 'mean', ci = None)
# Customizing the background color
fig = plt.gcf()
fig.patch.set_facecolor('#dfedff')
ax.set_facecolor('#dfedff')
# Labeling
plt.title('Price Trends Over Seasons')
plt.xlabel('Season')
plt.ylabel('Average Price')
# Customizing the x-axis (Season Names)
plt.xticks(['Winter', 'Spring', 'Summer', 'Fall'])
# Showing
plt.show()
5. Monthly Price Trend¶
# Extracting months from date feature
vis_data['month'] = vis_data['date'].dt.month
# Figure Size
plt.figure(figsize = (10, 2))
# Visualization - Lineplot
ax = sns.lineplot(x = 'month', y = 'price', data = vis_data, estimator = 'mean', ci = None)
# Customizing the background color
fig = plt.gcf()
fig.patch.set_facecolor('#dfedff')
ax.set_facecolor('#dfedff')
# Labeling
plt.title('Price Trends Over Time (Monthly)')
plt.xlabel('Month')
plt.ylabel('Average Price')
# Customizing the x-axis (Month Names)
plt.xticks(range(1, 13), ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
# Showing
plt.show()
6. Weekly Price Trend¶
# Extracting Weekdays
vis_data['weekday'] = vis_data['date'].dt.weekday
# Figure Size
plt.figure(figsize = (10, 2))
# Visualization - Lineplot
ax = sns.lineplot(x = 'weekday', y = 'price', data = vis_data, estimator = 'mean', ci = None)
# Customizing the background color
fig = plt.gcf()
fig.patch.set_facecolor('#dfedff')
ax.set_facecolor('#dfedff')
# Labeling
plt.title('Price Trends Over Weekdays')
plt.xlabel('Day of the Week')
plt.ylabel('Average Price')
# Customizing the x-axis (Weekday Names)
plt.xticks(range(0, 7), ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])
# Showing
plt.show()